DOCUMENT RESUME 



ED 214 768 ' 

AUTHOR 
TITLE 

INSTITUTION 
SPOHS AGENCY 
PUB DATE 
GRANT 
NOTE 



EDRS PRICE 
DESCRIPTORS 



J 



SE 036 ^59 • 

PfeiffW, Paul E. ,* / 

Conditional Independence in Applied Probability. 
Education Development- Center, Inc., Newton, Mass. 
Natior\ai\ Science Foundation, Washington, D.'c. 

79 . 

SED76-19615-A02 

162p.; For\ related documents, see SE 036 458, SE 036 
.466, and SE 036 468-469. 

MF01 Plus Postage. PC Not Available from EDRS. 
♦College Mathematics; Higher Education; 
♦Instructional Materials; Learning Modules; 
♦Mathematical Applications; Mathematical Concepts; 
♦Probability; ^Problem Solving; Supplementary Readi 
• • '.Materials; Textbooks ' 

ABSTRACT • „ \ . • 

' .... This material assumes the user has the background, 

provided by a good 'undergraduate course in applied probability. It is 
felt that introductory courses In calculus, linear algebra, and 
jperhajps some differential equations should provide the requisite 
(experience and proficiency with mathematical concepts, notation, and- 
,r argument. The document is divided into'five major sections, each 
f concluding with a, set of , exercisers. The igajor parts are entitled: (A) 
Preliminaries; (B) ' Conditional Independence of Events; (C) 
Conditional' Expectations; . (D) 'Conditional Independence, Given a 
Randcan Vector; a*nd (E). Markov Processes and Conditional Independence. 
The document includes three appendices, a. brief list of references, - 
*• . and * presentation of selected answers, hints, and key steps./ (MP) 

\ ' ' - 



ng 



4 - 



7 



• ******************^************^******** ******************************* 

f * - Reproductions 'supplied by EDRS are the Jbest that can be made ; "* 
* JErom the original document. 



**** ************************** ******* ********************************** 



me 




^uiTJtAinMBfroF^txicATKwr *] 

NATIONAL INSTITUTE OF EOUCATION 
EOUCATIONAL RESOURCES INFORMATION 
CENTER (ERIC) 
Ths document has been reproduced as' 
received from the person or organisation 
originating it 

Minor changes'have been made to improve ' 
S reproduction' quality 



Points of v»ew or opinions stated m this docu 
ment do not necessanry represent official NIE 
position or ppl»cy * \ 




CONDITIONAL 
INDEPf 

IN APPLIED 

PROBABILITY/ 



/ v 



i.. 



PauIE.Pfeiffer 

Department of Mattiematigal Sciences 

/ Rice University I 

Houston, Texas . ■ \* 



"PERMISSION TO REPRODUCE THIS 
MATERIAL IN MICROFICHE ONLY 
HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (E*IC) " 0 



E RIC^/fN #/ 5 ? 6hapel * t ,/ n ^ ton ' mass. 



02160 



r . 



THE UMAP EXPOSITORY MONOGRAPH SEJUES 



SPATIAL MODELS OF ELECTION COMPETITION 
Steven J.%rams. New York University 



ELEMENTS OF THE TIIEOR Y OF GENERA IJZED INVERSES FOR MA TRICES 
Randall E. Cline. University of Tennessee ^ 

INTRODUCTION TO POPULATION MODELING # 
James C. Frauenthal, SUjW at Stony Brook 

. CONDITIONAL INDEPENDENCE IN APPLIED PROBABILITY , ^ 
. Paul E. Pfeiffer, Rice University 



This material was prepared with the support of National Science 
Foundation Grant No. SED76-1 961 5 A02. Recommendations 
expressed are those of {he author and do not necessarily reflect 
- A the views of the NSF, or of the copyright holder. 





• 


v umap , , 

^Mctfujw and Monographs in ; ' ' 
Undergraduate Mathematics - > 
and its Applications Project * 




* * , 


* 
S 

* 


CQmjrTIQNAL 
INDEPENDENCE 
IN APPLIED . 
PROBABILITY 

• > 

i 


f , 


PaulE.Weiffer 

Department of Mathematical Sciences 
ljice University % 
Houston, Texas 


ft 
» 

'l 


The Project acknowledges Robert M. Thrall /"^-^ « * 
• Chairman of the UMAP Monograph Editorial ' 
€ Board, for his help in the development and 
review of this monograph. 


« r 


4 - ' - . 



r 



Modules and Monographs in Undergraduate Mathematics and its Applications Project 

The goal of UMAP is to develop, through a community of users and developers, a 
system of instructional modules and monographs in undergraduate mathemaiteand 
its applications which may be used to supplement existing courses and from wiucfr n 
complete courses may eventually be built. % . ^ 

The Project is guided by a National Steering Committee of mathematicians, 
scientists, and educators. 'UMAP is funded by a grant from the National Science 
Foundation to Education Development Center. Inc.. a publicly supported, nonprofit 
corporation engaged in educational research in the U.S. and abroad. 

UMAP wishes to thank Charles Harvey of Ric* University for his review of this 
manuscript. # 

The Project acknowledges the help of the Monograph Editorial Boardin the 
development and review of this monograph. Members of the MonographEditorial 
Board include: 



Clayton Aucoin 

Chairman, Sept. 1979 - 
• Robert ti. Thrall 

Chairman. June 1976£ept. 1979 
James C. Frauenthal 
Helen Marcus-Roberts „ 
Ben Noble 

Paul C. Rosenbloom ' 

4 % ' \ 

Ex-of ficio members: 

MichaefXnbar 4 

G. Robert Boynton 

Charles P. Frahm ' 

Kenneth R. Rebman 
sCarronaWOde 
' Douglas A. Zahn 

Project administrative staff: 

Ross L. Finney 
Solomon Garfunkel 
Felicia DeMay « 
Barbara Kelczewskr 



Oemson University 

Rice University 
SUNV at Stony Brook 
Montdair State College 
University of Wisconsin 
Columbia University 



SUNY at Buffalo 
University of Iowa 
Illinois State University 
California State University 
Naval Postgraduate School • 
Florida State University 



Director \ 

Associate Wrector/Consortium Coordinator 
.Associate Director for Administration 
Coordinator for Materials Production * 



Copyright ©1979 by Education Development Center, Inc. AH rights reserved. 
Printed ur the United States of America. No, part of this publication may be 
reproduced, stored in • retrieval system, or transmitted, in any form or by any 
means, electronic, mechanical, photocopying, recording, or otherwise, without 
the prior written permission of the publisher. s 



/ 



CONTENTS 



.PREFACE ;* > • 

A. PRELIMINARIES * • * * * , 

1. Prdbabilfty Spaces and Random Vectors 

Z Mathematical Expectation / 

3. Problems - / " s 

B. CONDITIONAL INDEPENDENCE OF EVENTS ' 

1. The Concept , * - 

. Z^me Patterns of Probable Inference a 

3. A Classification Problem * a 

4. Problems . "\ 4 , , 

• J -i * 

C. CONDJTIONAL EXPECTATION 

- ^ 5^. Cooditioningliy an Event 

«~Z conditioning by a Rancjom Vector-Special Cases 
- 3. Conditionidg^by a Random Vector-General Case 
4. Properties: of Conditional Expectation 
5V Conditional Distributions 

6. Conditional Distributions and Bayes' Theorem, 

7. * Proofs of.Pfbperties of Conditional Expectation 

8. Problems ^ 

D. CONDITIONAL INDEPENDENCE, GIVEN A RANDOM VECTCfR 

t. The Concept and* Some Basic Properties 
/Z Some Elements of Bayesian Analysis 

3. A One-Stage Bayesfch Decision Model 

4. * A Dynamic-Programming Example* ' > 

5. Proofs of .the Basic Properties ' 

6. Problems • ' 1 • « 

EjMA^KOV PROCESSES AMD CONDITIONAL INDEPENDENCE 

,1. Discrete-Parameter Markov Processes 
Z Markov Chains with Costs and Rewards 1 
* 3. Continuous-Parameter frarkov Processes ~* ' 
. 4. The ChapmarVKolmogorov Equation 

5. {Proof of a Basic Theorem on Markov Processes 

6. Problems * * * c * 



APPENDICES _ ^ ^ / ' 

Appendix I. -Properties of Mathematical Expectation ' * 

Appendix I L Properties of Conditional Expectation, Given a Random Vector 

Appendix III. Properties of Conditional Independence, Given a Random Vector 

^REFEfJ^NCES . ' „ 

SELECTED ANSWERS, HINTS, AND KEY STEPS • 



^ * Preface 

It would be difficult to overestimate the impprtance^of stochastic 
independence in both the theoretical development ^nd the practical appli- 
cations of mathematical probability. The concept is grounded in the idea ' 
that one event does not "condition" another, in the sense that occurrence 
of one does not affect the likelihood' of the occurrence of the other. This 
, leads to a formulation of the independence condition in terms of a simple 
"product rule,^ which is amazingly successful in capturing the essential 

ideas of independence. • ? 

> f « * • 

Howeyer, there are many patterns of "conditioning" encountered in 
4 , 

practice which give rise to quasi independence conditions. Explicifand 

precise -incorporationfef these into the theory is needed in order trf make 

the most effective use o*f probability as a aodel for behavioral and 

physical systems. We examine two concepts'of conditional independence. • 

The first concept is quite simple, utilizing very'elementary aspects 
a . """^ " **** "\y * * 

of probability theory. Only algebraic .operations are required e to obtain 

• * * ^ ' 

quite important and useful new results, and to clear up many ambiguities 

and obscurities in the -literature. 

The second concept of conditional independence has been employed for 

*** ** #- /* * 

aotie time in advanced treatments, of, Markov processes. Couched in terms 

'* *\' 1 - ' 

m of the abstract notion of conditional' expectation, given a sigma field * 

t ' J? f ^n^rthls concept has been available only to those with the requisite 

measure-theoretic preparation. Since the use of this concept in the 

-* * ' ' \ 

theory of Markov processes not only yields important mathematical results^, 

but a^o provides conceptual advantages for the modeler, /it should be 

made 'available to a wider *lass of users. *he case Is made more compelli 



o 

ERIC 



* 9 

by the fact Chat the concept, i&nce available, has served to provide new 
f 

precision and insight into the handling of a number of topics'in, probable 
inference and decision, not related di^ectly^ to Markov processes. 

The reader is assumed td have the background provided by a good under- 
«^|raduate course in applied probability (see Sees Al, A2). Introductory 
courses in calculus, linear flgebra, and perhaps some differential equations 
should provide the requisite experience and proficiency wittf mathematical 
concepts, notation, and^ argument., In general, the mathematical maturity 
of a junior* or senior student in mathematical sciences, engineering, or 
one of the physical sciences should be adequate, although the reader need 
not be a major in any of , these fields.. v # 

^Considerable atstention is^given Co careful mathematical development,, 
this serves two types of interests, which may enhance and complement one 
another, the Serious practitioner of the art of utilizing mathematics • 
'needs in-sight into the system he is studying. He also needs insight into 
the jnodel he is usin^ ke needs to distinguish between properties of the 
model which are definitive or axiomatic (and hence appear as basic assump- 
% tions) a,nd those which, are logical consequences Ji.e . , theorems) deduced 

- . ■ \ r s : v 

from the axion&tic properties. For e^aibple, if his experience makes it 
reasonable to assume that a dynamic system is characterized by; lack of 
"memory", so that the" future is condifeloned^only by the present state and 
not past history, then it is appropriate to consider representing the » . 
system as a Markov process -^.Should the system faij. to exhibit certain 
consequences of the Markov assumption, then that fundamental assumption 

^ must be reexamined. The distinction between fundamental properties and 

v ' * 

derived properties Is an aid to efficient and intelligent use of mathematics 

i 

fas well as insurance against contradictory assumptions). 

• . « 



:ERIC 



vi- 



The serious mathematician wt}o^ wishes to enlarge his knowledge and 

appreciation of the applications of mathematics fand perhaps discover new, 

significant problems' ^may be deterred by the inadequate articulation of J 

mathematics in much of the applied literature. This may be a serious 

barrier to what should be a cooperative endeavor. Hopefully, the 

. present; treatment will help remove any such barrier to consideration 

, of the interesting and important topic of conditional independence. 

In order to recast th^ theory of conditional inaependentfe of random* 

vectors in more elementary terms, it ha^ been necessary to 'extend the 

usual introductory treatment of conditional expectation, given a random 

vector. The treatment intends to bridge tjie gap between the usual intuitive 
* § 

, introductory treatment-, based on a concept of conditional distribution,, and 
a more general approach found in advanced, measure -theoretic treatments. 
Because of the importance 0 $ conditional expectation as a tool in the study . 
o/ random processes and of decision' theory, the results should be Useful — s 
beyond the scope of the present investigation. * - 

Acknowledgements 

^ „ It is^parent that a work' o"f this sort draws on a variety of sources, 
many^of which are no longer identifiable. Much of 'the impetus for writing 
came from teaching corses in probability, random processes, and operations 
research. The response of students and colleagues to various presentations 
has been helpful in many ways. The development of 'the^ concept of 'conditional 
independence of events has been stimulated and shaped in large part by 
my collaboration with David A. Schum, Professor 6f Psychology, in some 
aspects of his work on human inference. He has tfcad critically several o 
Versions of the manuscript. Charles M.Harvey of Dickinson College, while 
on visiting appointment in Mathematical Sciences } a^RLce University, read * 

• \ 




vti 



critically a preliminary manuscript presented for review. His comments 
were helpful in planning the final, extensively revised manuscript. 
Dr. David W. jfcott of Baylor College of, Medicine and Rice University used 
some pf the results in recent work. His comments were helpful in improving 
exposition at several points, and his work provided an interesting applications 
problem. 

Paul E. Pfeiffer 



\ 



ft 




9 

V * 


* i, ' 
is j 

* » i * 
i ^ . 

\ • 


# 

> 

** 


* 

* 


A. Preliminaries 


t 

% 

m 

■ * 


* K 

( • 


i 




. » 


» • # 





(■■ 



> -v . . . 

A. PRELIMINARIES 

1. Probability Spaces and Random Vectors 
Z Mathematical Expectation 
3, Problems 



AM 
A2-1 
A3-1 



V 



ERLC 



12 



1 ■ 1 

Al-1 

CONDITIONAL INDEPENDENCE IN APPLIED PROBABILITY 
A. Preliminaries 

^ y In this monograph, we assume the reader has reasonable facility with 

elementary probability at the level of such texts as Pfeiffer and Schum 

[1973],, Ash [1970]; br Cbung [1974]. In particular, we suppose the reader 

is familiar with the concept of a random variable, or a random vector, as a 

mapping from the basic space to the real line or to Euclidean space 
_n * r " » 

' 3X1(1 •** h the notion of mathematical expectation and its basic proper- •* 

* ties Uf Pfeiffer and Schum [1973J , Chaps 8, 10, 13). In the following 
sections, we summarize various 'fundamental concepts and results in a'form, 
terminology, and notation,t 0 be utilized in Subsequent developments. In 
some cases, we simply express familiar material- in a form useful' for our 
purposes; in others, we supplement the usual introductory treatment, 
especially with an informal presentation of certain ideas and results from 1 
measure theory. Ihe reader may wish^to scan this material rapidly, re- 

turning as needed for later reference. s ' * 

*• * 
!• Probability spaces and random vectors 

A probability space, or probability system, consists of a trip'le (ft,3,P), 

1) 0 is tiie basic space , or sample space, each element of which repre- 
~ **nta one of the conceptually possible outcomes of a specified trial , 

* . or experiment. Each elementary outcome u> is an element 0 f the basic 

space ft, ** i * 

2) 3 is a class of subsets of ft. Each of the subsets ita this class is 
an event. The event A occurs, iff the tt> resulting from the trial ^ 
is an element of A. Since it is desirable that the sets 'formed by " 
complements, countable ugitonsY or countable intersections of events 

also be events, the class 3 t must have the properties of a sigma * 

w . — 



13 




field (also called a Ibrel field or a sigma algebra) of sets. 
3) The probability measure B assigns to each eventf^ft a number -P(A) 
in such a manner that three basic axioms- (and logical consequences) m 
• hold: i) P(A) > 0, ii) P(fl) =» 1, and iii) P is countably additive. 

We utilize standard notation V for the empty set (impossible event), 
complements, unions, fend intersections. Thus*, for example, 

0 is the empty set (the impossible event), , » 

CD - ' 

U A. is the union of the infinite class (A. : 1 < i < •) 
1-1 1 . 

0 B, is the. intersection of the finite class '{B. : 1 < i < n) 

i=i 1 , 1 . ~ " • . - 

A c is the complement oif* the set & 
In addition, we employ the notation x tj to indicate not only that we 



we^emp 



i=l 



have taken the union of the class {B^: 1 < i < n}, bu t - als o th at the 
class is disjoint (the events are mutually exclusive). Thus, the expression 

00 

A » W A means the same as the pair of statements 
— i«l 

00 I 

.i) A - UA and ii) A A. ■ 0 for i * j. 



maptrfn^^rc 



A random vectorfis viewed as a maptrfng from the basic space CI to 
n-dimensional Euclidean space R n . For n ■ 1, we have a real -valued 
random variable ► A random vector X: CI -» Y? 1 may be considered to be. the 
joint mapping (X., X 0 , X ) i n -» P X P X . . . X R produced by the 

coordinate random variables X^, X^, . .., X^. 

Since we want to be able to make probability statements about possible 
sets of values to be taken on by random vectors, we must introduce 
measurability considerations. In the real-valued case (n * 1), we should 
like to speak of the probability that X takes on a value no greater 
than some real number t. Since probability JLs assigned to events, the 



14 



set {»: X(«>) < tj^ihbuld be an event; for any real number This may 

be viewed schematically with the aid of a mapping diagram, as in Figure 

Al-1, We are interested in the* set , A of chose elementary ^outcomes cu 

which *re snapped into the interval i fc « (-», t J . Since we also want to 

consider complements, countable unions, and countable intersections of 

such events, we must consider complements, countable unions,' and countabl 

intersections of such intervals on the real }ine. We are' thus led to 

consider the minimal sigma field B of subsets of the real line which . 

includes all the send -infinite intervals df the form I t ] . ^ 

• t 

is the class *8 of Borel sets on the real line. A similar .consideration 

leads to defining the class 8 ol Borel sets on R n as the minimal 

sigma field which includes sena^Lnfinite intervals of the fo^rm 

Kt 1 ,t 2 , t n ) =- tTx ... X ('-«, tJ. We 'say that 

X: n -> tf 1 is a random vectro^ i|ff X^CM) = {uj: X(cu) 6 M) is'an.evenf 

for each Borel'set M in^ rf 1 . A standard result of measure theory,. ' ' 

which we assume without proof, ^s that V A (M) is an*event for each fiorel 

set M iff X* [Ut v i t 29 t n )} is* an event for each . n-tuple 

(t 1 ',t 2> t n ) of rWjHJfabers (i.e., for each element of p 11 ). -Real- 1 

» * * t * 

valued random variable s' are included as the special case n = 1, 

It is an easy consequence of elementary mapping theorems that the 
class t 3(X^ of ail, inverse ^images -xW) of Borel sets is a sigma field 
We *refer to this class as the sigma field determined b^ x. It must be a 
subctass of the class 3 of Events in order for X ' to be a random vector 

We often' need to consider functions of random vectors'. If X: 0 -* i R n 
and °g: rf* -* ff , then Z « g*X « g(X) is a function 0 -> jf " if g ' has 

the property that N - g" a (M) is a Borel set in. tf 1 for each Borel set 

»M in its codomain FT, then Z is a random vector, since? Z~ A (M) - 



» X-V^M) - X _1 (N) ifan.event. mis, each event determined by z is 
an event determined by X. This may be expressed bfc the* relation 3(z) 
is contained in>»3(X). mis condition is oftened" indicated by saying\hat 
Z is measurable with respect to X * (or Z is measurable-X). A function 
g with the mapping property described above is known as a*Borel functjfcn . 
From somewhat advanced arguments, it is known that if Z is measurable-X , 
.then there is a Borel function g such that , Z = g. X = g(X).. We assume 

> ^ s 

this Important result without proof. J % V* 

■ ^ We have introduced the class- of Borel functions in a somewhat abstract/ 
manner to solve the problem of when a function of a random vector i's itse^ 
^a random vector^ But- how do we know whether or not a function encoCte^ed 
in practice is Borel? it tul^out thafalmost any function g: r" -> ff 
which we Wy want to consider is Arel. For. this reason, in many introduct- 
ory treatment little or nothing Is" said about Borel functions. 

Borel, Wtions constitute a generalization of^he olass of continuous 
functions. Continuous functions have the' property "that the inverse image 
of any open set is open, ItHs known that the class of Borel sets on R n 
.is the minimal sif^a field.which includes all open sets in V 'from this 
fact it may be shown that any continuous function from -R n to R m .is 
Borel^ Any piecewise continuous real function g: R -> R is Borel'. -Linear 
^""Mmbtaationa, products, and compositions (functions of functions) of Borel 
c functions are Borel. If f^': 1 < n) is a's.equ^ce of Bdrel functions from 
^ "to which converge for each t in" R 11 , the limit functioiT g is a 
. Borel function. , -> , 

The indicator function I A for/set A in Q 9 defined by I A (to) - 1 , 
* * A % 

for U) in A and zero otherwise, is particularly useful. Jlf A is 'an 

. event, I A . la a ran&m variable., Indicator functions may be defined, as . * 



Al-5 ffr\ • 1 

J 

* well, on F?V If H is a Borel set in I^^^then I is a Borel function 

n m 1 ^ * 

from R to R. If c is an element of R , then cl w is a Borel 

♦ *n 

function from fP to fP. If X is a random vector and M is a Borel 
set on the co domain of X^then I M (X) s^is^a real -valued random variable, 
measurable-X v If M ' is a subset of fP arid N' is a subset of F? 1 , then 
the cartesian product M X N «((t,u): t G M, u € N) is a subset of X R 
The indicator function I w v „: R m X R n A R satisfies the equation 

since (t,u) 6 M X N iff both »t € H and u € N. 

0 

The following result is basic in the development of the concept of 

B • • 

conditional expectation. -. ^ 

Theorem Al-1 

* ' jn 

a> If Y is a rarido^n vector with codomain Ft , M is any Borel det in 

if, and C - (a>: Y(u>) € M) - Y" l (M), then I c * I M (Y). . 1 1 

b) If g is a'Borel furtction F?" -* F? 1 and Z ■ g(Y), then for any 6 
1 Borel set N in R n , there is 'a Borel set (N) in fP such 



that I N (Z) - I M (Y). 



PROOF * ^ 

a) 1^[y(<d)] - 1 iff ¥(«>) € M iff a) € C i££ Ip(u)) - X' , 

b) The gelation C * Y (M) - Z («) is an elemojitrfl property of 
composite mappings'. By a), X^(E) ■ I c ■ I M (Y> jj 

The indicator function^is useful in representing discrete random 
.variables, which take on a finite or countably infinite set of values. In 
the finite case, the terri* simple random variable is commonly used.* Suppose 
the range (set of possible values) of X is S - (t^, t^ 9 t N ) c FT. 

Let A t - Joj: X(<d) « t^. ^Then the class 1 # < i < N) is a partition, 



N 

and X « S t I. 



s ^V- We refer to this representation^ canonical form 
i«l i m - 

(if one of , the values is zero, we include a term with zero coefficient). 

It is easy to show that any real random variable i s the limit of a 

* sequence of such' simple randoraivariables (the sequence is not^nique). 

If *X is nonnegative, it is the limit of an increasing sequence of 

nonne^gative, simple random variables (cf Pfeiffer and Sohu^f[l973j , S ec 8.8) 

Similar statements may be made about Borel' functions. A simple Borel 

function g: ff -* R n ha8 canonical form g = S t;r , where t <= R n 

i=l 1 M i 1 
and each ^ = {u g if: g ( u ) - tj is a Borel set in R m . , 

A random vector induces a probability, distribution on the Borel $ets 

of its codomain. To each Borel set M is assigned the probability mass 

on fhe event x" X (M). A probability measure P x is defined on the Borel 

sets by t\je assignment P X (M) = P[x^(M)] * P(X € M). This* is a true 

probability measure, with* the^ Borel sets serving as events. This mass 

distribution may also be described by a probability distribution function 

F x or, in suitable 'cases, by a .probability density function f . These 

matters are assumed to be familiar. 4 
P 

For many purposes, if a random vector is modified on a set of a, 
having zero probability, no significant difference is realized in probability 
calculations. For example, if X and Y are ; two real random'variables - 
with the property >at the set of <u for which X((o)- * Y((u) has probability 
zero, these random variables have t;he same mathematical expectation. 

DEFINITION. Random vectors X, Y ,are almost surely equal , denoted 

1 * . 

X « Y a.s., iff they have the same codomain and tne set {«o: X(«o) j6 Y((u)J 

1 1 * ''1 * 

has^probability zero.' * 

1, * v 

More generally, a relation between random vectors is said to hold almost 
surely (a.s.), or to hold f^r almost every' * (a.e. ) <u, iff the set of 



<P for which the relation fails, toehold has probability zeroi. 

t 

We are frequently concerned with functions of random vectors. 

. - * * . ' n . 

Suppose we have random vector X: 0 -> R and have two Borel functions 

g, ,h; F n -* F*, If these functions have the property thajt "g(t) - h(t) 

for all t oil the, range of then we must have g[X(cu)] * h'[x(uu)] 

for all ci>. Again, we may not need this equality for all t*>. It may ' 

be sufficient to have equality for almost every cu (i.e., for all uj 

except possibly an exceptional set of probability zero).* Suppose 

M Q « {t^ F n : g(^ ^h(t)). Then g[x'(y>)] * h[x<<i>)] iff X(uj) is*<^ ^ 

''one of the values in Mq. Hence, g(X) = h(X) a.s. iff the set of u> 

for which ^X(U)) 6 has probability zero. But this is" just the condition 

that the induced probability P x (Mq) * P(X € Mq)^ 0- 

The notion of almost -sure equality for random vectors can^be extended 

to Borel functions when the probability measure is defined on the class 

of Borel sets on the domain of the functions. We are particularly 
% 

interested in the case that such measures are probability measures v 

4 \ 

induced by random vectors. , 

DEFINITION. If g, h are Borel functions from f" to fP and P x 
is a probability measure on the Borfc^sets on F* 1 , then g and h^ 
,are said to be almost surely equal [P^] iff the set Mq ■ 

* {t e F : g(t) t h(t)) satisfies the conation P X (M Q ) » 0.. 

The discussion above provides the justification for the following 

t 

Theorem Al-2 

. g(X) - h(X) a.s. iff g«h a.s. [P x ] , where P x is the 
probability measure induced by the random vector X. jj 

Independence of random vectors is expressed in terms of t}ie events 
they determine. , v « 



DEFINITION. * An arbitrary class (X^ i S J) of random vectors is . 

independent iff for each .class [M^. i 6 J} of Borel s«ts on the respective 

codomains of 'the ther class (X* 1 ^):. U€ J) of events is independent. 

Ihis means that thS product rule holds for each finite subclass^of the 
class-of events'. The following Is known to be consistent with the above. 

DEFI-NITION. Two classes (X^ : t 6 T) and {Y u : u eU') form an independent 

family 6f classes iff for each finite T C T. and U c U the random * 

n — m 9 

vectors (X , X , X ) .and (Y , Y , .TTTy ) form an 

1.2 ' n H v z U m „ ' ^ 

, *' independent pair. * 

\ / V ' * 

* JKe latter definition extends readily to arbitrary families of classes. 

- #In the next .section, we state "the copdition for ^independence of. a class of 
random vectors in terms o4 mathematical expectation. * 

»If fX,Y} is an independent pair of random vectors (any finite dimen- 
sions) and g, h are Borel functions on the codomains of X, Y, respectively, 
then (g(X),h(Y)} is an independent pair. This follows from the fact 
that (g(X) € M) =» (X 6 g^CM))' and (h(Y> 6 N) = (Y 6 h -1 (N)), so that s > 
P(fg"(X) € M} fT (h(Y) € N} - P({X € g'^M)} 0 (Y € If 1 (H))) 
P^[X €' g~*(M)]p[Y € h'^N)] « p[g(X) € MjP[h(Y) 6 N] . It should be apparent 
this result extends to arbitrary classes. 



A2-1 



2. Mathematical expectation > 

• & * 

"The concept of mathematical expectation incorporates the notion of a 

probability weighted average*. Suppose X is a simple, real-valued random 

variable with range (t., t , t }, Ihe mathematical. expectation of X 

n * 

is E[x3 » £ tjP(X * t^. «Each possible value ^ is weighty by the* 

probability that value will be realized; these weighted values are summed 

to give a probability weighted sum; since the total weight is one, the 

Sum is the same as the average 1 . ; * # 

Tof extend the notion, we consider <nex^ a nonnegative random variable 

X. In this case, there is. a nondecreasing sequence of simple random var- / 

iables which converge to X. * We define ^ 1 ,? 

Mx] - f X dP » lim E[X ] # . , 

n n v 

# A study of the technical details shows that the limit does not depend upon f 
the particular approximating sequence selected. To complete the extension 
to the general c as e,jjj*e^ represent JC as the difference X + - X of the " 
two nonnegative ran<jpm variables defined as follows: 



* X + (a>) 



X(u>) 4 fcS ; X'(u)) > (T ( 0 for X(o}) > 0 

X (oj) 

0 * ifir X(u}) < 0 



- X(o}) for X(o}) < 0, 
s 

> Then E[x] = fe&J - E[xJ. Thus. E[x] is the limit of the probability 
weighted average* of the values of the approximating simple functions. As 

a I * . f " ^ 

such, mathematical expectation should have properties of suflls or averages 

■ 1 * 'j * 

Which "survive passage to a limit." This is 1 , in fact, the case. THfe 
defining procedure defines a very general type of integration (Lebesgue 
.integration). ? * 

For convenience, we list and assign numbers to those properties of ' ' 

mathematical expectation which are^most useful in investigations such as , 
those Jn subsequent sections. m Since an indicator function for an event 



ERJC ^ ~ " 22 



Is a simple random variable whose range i/s r {O; 1),. we have ,% 

") b[i J -t(a). ; • ; " • 

., A • f 

Use of. theory Al-1 and the fact that I^(x,Y) - I^I^Y) 'gives the 
following important 'special cases, ' " 

EU) E[I M (X)] - P(X ^M) and, Efyx)!^)] - P(X <= M, Y € N) (with 

..ion'by 'mathematical induction to any finite number of random^ectors) 



exten- 



^ 



s 



Elementary arguments show that the following properties of sums hold also 
for mathematical expectation in general. 

E2) Linearity . E[aX + bY] * aEfr] + bE[Y] (with extension by mathe* 

matical induction to any finite linear combination). 
E3) Posltlvity ; monotonicity . 

a) X>0 a\s. implies E[x] > 0, with equality iff X^O a.s. 

b) Ay 'a.s. implies e[x] -> E[y] , with equality iff X - Y a.s. 
Jt should be noted that monotonicity follows from linearity and pbsitivity. 

^The next property is not ordinarily dlscussejl in elementary treatments 

However, it is essential to much of the theory of mathematical expectation. 

Suppose £ < X a.s. for all n > 1 and X (<u) -> X(<u) for a.e. q>. 

n — n+1 — n ^ 

By property .E}) , . we must have EtxJ < E[X n+1 ) < E[x]. Since a bounded 

monotone sequence of real, numbers always converges^we must have v 

lim E[x ] « h < E[xH Sophisticated use of element ar^r ideas establishes 
rr*° n i 

the fact that the limit L * E[x] . A similar argument holds for monotone 

decreasing sequences. Thus, we have , 

E4) Monotone convergence . If ^X r -> X. ^onotonically a.s., then 

. ' E[X ] -> ?tx] monotonically. * 

• n " 
In many ways, these four properties characterize ^mathematic a 1 expectation as 

an integral. A surprising number of other properties stem from these. In 

ttfe devetopment of the idea of conditional expectation) we establish its 

integral-like character by establishing analogs of El) through' E4). 

*■ 

By virtue %f the definition and property Ela) we can characterize 

i 

independence of random vectors as follows. — * . 



24 ^' «■ 



> • \ " . . A2-3 

E5) Independence.^ The pair (X,Y) of random vectors is independent 
» i« * Bfl M TOl N (Y)] . B[y X )]E[l N (Y)> for all Borel sets M, N 
on the codomains of *X, Y, respectively, 

iff Efg(X)h'(Y)] « E[g(X)jB|h(Y)] for' all real-valued Borel 
functions g, h $uch that the expectations exist. 
For an arbitrary family pf % random vectors, we have independence iff such a 
product rule holds; fo^every finite subclass^of two or more members. 
The next property plays* an essential role in the development of the 
•concepts conditional expectation. We proVe the basic result, which - 
^suffices for developing the properties of conditional expectation; the 
extension, 'whose proof requires/.some advanced ideas from^measure theory, 
is used in developing certain equivalent conditions for conditional 
independence, given a random vector (Sec D£). 
*-E6) Uniqueness . 4 " 

^^ Uppose Y is a random vector with codoraain K* and g; h *re> . 

real -valued Borel functions ( on the range of Y. : "if E[l (Y)g(Y)] 

M % 

- E[l M (Y)h(Y)] for all Borel sets M in the codomain of Y 
then g(V) - h(Y) a.s. 

**■ 

trf More generally, if B[l M <Y>I M <z) g <Y,Z)] « E[yY)I N (Z)h(Y,Z)] for ' 
,. *-all Borel sets M, N< in the codomains of Y, z,< Respectively; 
t|)£ir-g(Y,Z) « h(Y,Z) a. 9. w 
PROOF OF a). % * 

Suppose g(ir)>h(u) for u " in the set N. Then I N (Y)g(Y) > I N (Y)h(Y), 
with, equality iff Y(u>) does not belong to N. By E3)*, E[l N (Y)g(Y)] 
- rfl N (Y)h(Y)]' iff I N (Y)g(Y) - I N (Y)h(Y) a.s. iff P(Y 6 N) » 0.* ' a * ' 
r 4 similar argument, holds for the opposite inequality. Thus, the totar 
probability of the event (g(Y) t h(Y.)) is zero. « • 

> 




**-*C J ' 

DISCUSSION OF b) 

• * 
The second part? is more general, since the sets Q ■ M x N, with 

1^ « ^ orm only a subclass °C the^Borel sets on the co domain fcf 

the combined vector (X,Y). However, a standard type of argument in 

measure theory shows that if equality holds for sets , of this subclass, 

* it must hold for all Boxel sets. Application of part a) gives the desired 
result. 

Several useful properties ate based on El through E4), with monotone 

. . «■ 

convergence -playing a key role. The following are among the most important. 

r _ _ 

* E7) Fatou's 1-emma . If X >0a.s., E[Um inf X ] <Hm inf E[x "] . 

r- . n — n — n 

E8)- Dominated convergence . If X fl -» X*a*s<, and |X | < Y a.s., for each 
• * * , "* » ' 

n, with E[Y] finite, then EfxJ -> Efcc] . 

JB9) Countable additivity . Suppose E[x]exlsts and A« (J A , Then 
. • yJ-^ , i»l 

E[l A X] T. til X] . # 
A i«i A JL * 

, The following property is used as the basis for a general definition 

* of conditional expectation, given.a random vector. It^is based on the 
celebrated Radon -Nikodym theorem and the <£act, noted in the previous 
section, that if Z is "measurable -Y, then there is a Borel. function e 
Such that Z « e(Y), We accept this result without proof. It is made 
plausible in certain' special cases in the developments in Sfec C2. 

E10) Existence . £f E[g(X)] is finite, then there is a real -valued 



jrfc^Cjtetio^* e, unique a.s. [pj , such that 



- E[l vl (Y)e(Y)] for all Borel sets M in the co domain 

» . of Y. ^ - . 
i 

Recall^, bjr Theorem Al*2, e is unique a.s.,. [p^] iff # e(Y) is unique a.s. 

A number of standard inequalities are employed repeatedly in probability 

theory. Establishment of these depends upon setting up the, appropriate 
O 



inequalities on random variables, then utilizing monotonipity E3), Ihe 
appropriate inequalities on the random variables are often expressions 
of classical inequalities in ordinary analysis. Some of the n»re important 
inequalities are listed for convenient reference in Appendix I, 



27 



A3-1. 



3. 
A-l. 



A-2. 



A-3. 



Problems - 

For each of the following random variables, describe the sigma field 
3(X) determined by X, " 

ii) X • aI A + bI R + cl c (canonical form) 

If X - -21 A '+ 0I R + I c + 4I D (canonical form), describe X -1 (M) for 
i) M « (-«, 0], ii) M « (-2, l] U (2,4] . iii) M « (-», 3] 
Suppose X has distribution function F v with 



A-4. 



A-5, 
A-6. 



F x (t) 



0 for t < 0 

(1 + 3t)/£ for 0 < t < 1 

1 tor T*< t 



For which of the following functions, if any, is « g^ a.s. [P x ] ? 



g (t) « t + 1 for all t 
1 « 



g 2 (t) 



0 for t < 0 

t + 1 for 0 < t < 1 

2 for 1 < t 



g 3 (t) - t + k + 1 for k < t < k + 1, all integers k, all t 
If X and Y are real random variables, let 

X<u)) for X(u>) > 0 f-X(u)) for X(u>) < 0 



X + (u>) 



Show that 



for X(U>) < 0 



X <o>) 



for X(u)) > 0 



a) X + and X_ are Borel functions of Xj hence are random variables 

b) XY is a random variable 

c) aX + bY is a random variablV (a,b are constants). 
Suppose g: •* R n and f: •* F^are Borel functions. Show 
that the Composition fog: F™ R q is 

Use Theorem Al-1 and property •$!) for expectation to establish 
property Ela). 



9 

ERIC 



28 



A3.-2 

* , 9 

A-7, Use linearity E2) and positivity for expectation to establish 
mono tonicity, 

. a) Suppose X > 0 and E[x] is finite. Use the nono tone convergence 
theorem E4) to establish countable addifcivity E9) for expectation, 
b) Extend the result of part a) to the general case, 
A-9. If X is real, use the fact^ that X < |xf and - X < |x| to 

establish the triangle* inequality Bll) for* expectation, 
A-10, Establish the mean-value theorem E12) for expectation,' ^ ° 



£ ' r 



B. CONDITIONAL INDEPENDENCE OF EVENTS 

t 

1. The Concept 

Z Some Patterns of Probable Inference 

3. A £jassifi cation Problem 

4. Problems 



B1-1 
B2-1 
B3-1 
B4-1 



C 



Ierjc 



31 



B. Conditional Independence of events _ 

- ' * ' " y ' 

1. The -Concept 

« 

^ In setting up a probability model for a system under study, the 'modeler 
utilizes all available p/ior knowledge about the system to determine prob- 
ability assignments,, to appropriate,*^*. This knowledge""may be obtained 

■ from systematic statistical stuffy, or from mathematical deductions based 
on assumptions supported by experience or experiment, or, less formally, 

^ from the judgment of a decision maker. These probability assignments "serve 
to determine a prior probability measure, The probability P(A) of an 
ev'ent *A provides a measure of the likelihood of the occurrence of this ' 
, event. ♦ 

Further experience or experiment may produce information which make>\ 
appropriate to revise the probability assignments to reflect new like- 
lihoods of various events. Such revisions amount t 0 the introduction of < 
a new probability measure. typically, the information: received yields 
partial knowledge of the character t>f the outcome. When properly expressed; 
this new information serves *> ' identify an event C which has occurred. * 
' There may be subtle ties^fcid difficulties in. determining exactly what this 
cJEditloninfi ejggnt Q is (cf. Pfeiffer and Schum [1973], Sec 5-1). The. 
difficulties center about the question: What information is obtained by 
whom? But, in principle at least/such an. event is determined, \ ' * 

There is .nothing in the probability'nodel to require a specific manner * 
of reassigning probabilities. Hpwever, considerable experience has shown 
that a fruitful way to make the hew assignment of probability to event A, 
given the occurrence of conditioning event O, is ,fco utilize the rule 
*j P(A|C) - P(^C;/P(C), provided, of course, P(C) > 0. k 

We call P(A[Q / the conditional probability 0 f A , given C. For fixed 

"I *' * ^ 

C, PC|C) is a new probabifity measure, with all the formal properties 

• '.' . . . > . 

. .. ' ••• 32 





, . . . • 

Bl-2 * » .» > ... • * 

of the original, or prior, probability measure ?(•). 

It sometimes happens that occurrence of* the event C does not affect 

the likelihood that" A will (or will not) occur. Thus, we may be able, to 

assert that P(A|C) - P(A) or P(a|c) - P(a|c C ). As a matter of fact, 

straightforward use of the defining relation for conditional probability 

shows that if 0 < fc(A) < 1 and 0 < P(C) < 1, then ,thei following sixteen 

relations are equivalent-- that is, if one holds, so do the -others. 

P(A|C) »<P(A) P(C|A) - PKC) . P(AC) » P(A)P(C) 

P(A|C C ) » P(A) P(C C |A) « P(C C .) P(AC C ) = P(A)P(C C ). 

P(A C |C) =P(A C ) P(C|A C ) » P(C) P(A C XI) « P(A C )P(C) 

' P(A C |C C ) » P(A C ) „ P(C C |A C ) - P(C C ) P(A C C C ) - P(A C )P(cV 

p(a|cM« p(a|c c ) P(A C |C) - P(A C |Q C ) P(C|A) = p(c|a c ) P(C C |A) = p(c c |a c ). 

If any of these holds, we suppose the events A, C form an independent 
pair, in a probabilistic sense. It is easy to check that the equivalence 
of the four product rules in the right-hand column holds for the cases ♦ 
in which either P(A) or P(C) takes one of the extreme values 0 ^or 1. . 
Also, the first product rule is symmetric with respect, to the events A, C. 
Thus, it is convenient to make the definition of independence in terms of, 

product rule, as follows: v 

DEFINITION, The pair {A,B) of events is ( stochastically) independent 

iff the product rule P(AB) - ^A)P(P) Jiolds. 
An arbitrary class of events is independent iff a corresponding product 
rule holds for every finite subclass of two or more events from the class. 
The list of equivalent relations above (with C replaced by B) shows that 

If *any one of the pairs {A,B), {A,B C ), (A C ,B), or fA C ,B C ) is 

independent, so ar% the others. 

\ 

Although the product rule is the basis of the formal definition, the essential 
idea of independence is the lack of conditioning as exhibited ^ri the fact 




33 



Bl-3 



[jLhtt independence holds iff P(A| B ) - P( A | B C ) « F(A) ' iff P(B |A) = 
P<B|A C ) - P( B ). Ihe occurrence or nonoccurrence of B does not affect the 
likelihood of the occurrence of A, and the occurrence or nonoccurrence of 
A -does not affect the likelihood of the occurrence of B 
Example Bl-a 

Consider two contractors working on two entirely different' jobs. Let 
A - event contractor V completes his job on schedule, 
B « event contractor V completes his job on schedule' 
It may well be thai: these two contractors work" in a way that the performance 
of either has . no affect on or relation to the performance of the'otfcer. 
Ihus, it may be that P( A |b) - PCA|b c /, in which case the common value % 
is P(A). We should thus assume, in^modeling the situation, that {A,B) is 
an independent^>air of events. ^ 

h * 

Suppose (A,B) form an independent pair under the original probability 
■measure^Thialndependence is not an inherent property of, the events, (unless 
'at is/either the impossible event 0 V the sure event). Stochastic 

indepi^ncej/s.a property of the probability assignment ,' hence is determined 



by the probability measure P(.).> change to a new probability measure p^.j 
may destroy tli^tochastic independence. Ihe following extension of the ' 
contractor* example *hows how stochastic independence may fail to'hold,- even 
though the c/ntractors work "independently", in an operational sense. It 
also leads/to the concept of conditional independence. 
Example j)l -b ' 

Consider again the case of the .two contractors. There mav_ be some factor 
/the w^rk situation which, affects the performance of ttoth. .Suppose the** 
Jobs sre outside, where performance call be affected hy, the weather. Let ^ 
C - event the weather is "g<}od.". It may Bs» reasonable to suppose, that 



abliit 



Bl-4 



P(A|BC) «fcflk|B?C). That is, given good weather (i.e., the occurrence 
of C), the performance of contractor 'V has no effect on the 'perform- ^ 
ance of contractor "a", A similar situation may hold in the case of 
' bad weather. Since P(a|c) « P(A^|c) + P(AB C |C) = P(b(c>P(a|bc) + > 

P(B°|c)P(a|b C C), the equality P(a|bc) - P(AjB C C) implies that the common 

value is P(a]c). Under these conditions, the pair {A,B} will usually not 
be independent. There is a 'probabilistic tie" between these two events 
by* virtue of their relationships to the common event C. Let us examine 
the cpntractor example further by assigning rftoe reasonable numerical 
values . Suppose 

P(A|C)»0.95 P(B|CX«0.96 P(C) » 0.7 ! 

* P(Arc C ),»- 0.45' P(B|C C ) » 0.50 P(C C ) « 0.3. . 

* Udder the conditions P(A|BC) » P(a|b°C) and P(A|BC C ) = P(A|B C C C ) n we have 

p<ab) * ?(c)p(b|c)p(a[bc) + p(c c )p(b|c c )p(a|bc c ) 

» 0.7 X 0.96 x Q.95 + 0.3 X 0V5 X 0.45 « 0.7059 

, p(a)p(b) - [p(a|c)p(c) +,p(a|c c )p(c c )][p(b|c)p(c) + pxbIc^pcc 0 )] 

» [0.95 X.0.7, + 0.45 X 0.3][0.,?6 X 0.7 + 0.5 X 0.3] ■ 0.6576. 
Thus, P(AB) P(A$P(B), so (A,B) is not independent. If the contractors 
work "independently!, what is the tie between their performances? If A 
occurs, the lixelihodd of good weather is high, so that the tikelihood^f 
the occurrence of . B- is high. The numbers turn dut toT>e P(C|A) ■ 
P(A|C)P(C)/^(A) - 0.s\> 0.7 - P(fc) and P(6|a) = P(AB)/P(A) - 0.882 > 
0.822 " P(B). If this is the only effec.tive^tie between events A anj|| 
> B, "then once the weather is determined, there is no further influence 
of the performance of one contractor on that of the otter. 



Let us examine further the assumption that E(a| BC) 
forward use of N&e^Tte7inl3fr»rela^ conditional pr< 



_ . 



P(A|B C). Straight 
bability and # some 



elementary properties show * that the follow i n g c o n ditions are equivalent: 



Bl-5 



0 



p(a|,bc) = p(a|c) ,p(b|ac) » p(a|c)< ~p(ab|<d = p($|c)p(b|c) 
p(a|b c c) « p(a|c) p(b c |ac) = p(b c ]c) p(ab c |o) '= p(a|op(b c |c) 
p(a c |bC) =p(a c |c) p(b1a c c) =p(b|c) v *(a c b|c) = p(a c |c)p(b|c) 

P(A C |B C C) = pV|c) P(B C |A C C) ~ P(B C | C ) P(A C B C |C) = P(A C |C)P(B C |C) 

- p(a|bc) - p(a|b c c) p(a c |bc) = p(a c |b c c) 
^ p(b|ac) * p(b|a c c) ' p(b c |ac) * p(b c |a c c). ' » 

In view of our discussion above, it seems reasonable to ca&rphe common 

situation conditional independence, given C. Once " C occurs , fee occurrence 

or nonoccurrence of B does not further affect? the likelihood Jt k, etc. 

As in the case of ordinary or total independence, we utilize the product 

rule as the basis of the mathematical definition,* although some of the other 

equivalent relationships may be more useful in modeling.- 

DEFINITION. The pair {A, 3) of events is conditionally independent , 
♦ 0 
give" C> iff the product rule P(Ab|c) * P(a|c)P(b|c) holds. 

An arbtttracy class of events is conditional^ independent, given C, iff^ 

a corresponding product rule holds for every finite subclass of two or more 

I * 
events from the class. ' 

The product rule shows that conditional independence, given C, is 
•„ . * ^ 

just*ordinary independence for the probability measure P (.) ■ P(»|c). 

-Conditioning by C leads to a new probability measure. In terms Zf this** * 

yew probability measyre, .the pair (A,B) is stochastically independent. 

' 'As fo^the^pflOt probability measure, we can assert that 

If any-of the pairs {A,B^A (A,B C ), (A C ,B), or (A C .,B C ) is 

conditionally independent, given. C, then so are the otfierS. 

■ In Etpmple Bl-b, the conditioning event, C is such that we. have 

, conditional independence, given * C, and also< given C C . If the weather 




'1.36 



Bl-6 - 

^is goo£, the contractors work independently; they also work independently 
' if the 'weather is*bad. ^ Such is not the case for all conditioning events. 
Example Bin: 

Suppose the two contractors of the previous example use some common item. 
- Let D « event this item is in gtfod supply and D C «= event this item is in • 
short supply. If the supply is good, it is reasonable to suppose that the 
pe^ormance of one contractor has no effect on that of the other. Hence, 
it' is reasonable to asjpe that P(A|BD) - P(A|B C D), which is equivalent 
to assuming {A,B} is conditionally independent, given D. However, if 
the supply is short^(i.e., if V occurs), the contractors may be in'compe- 
tition for the scarce item. Thus it may be reasonable to suppose P(A|BD C ) 
<-P(A|B C J> C ). If contractor "b M completes his job on time he has probably 
obtained th^^arce item to the detriment of contractor "a". This condi- 
tion violates one of the equivalent conditions for conditional independer! 

' C £3^5» 

of {A,B}, given D , so that we must assert conditional nonindependence. 
It is not difficult to show that in this case the pair {A,B} is not 
^ totally independent. * / 

The following development show's that conditional independence, given 

c - - * Mm 

one or both a and^C, is unlikely to yield total independence. 
In the case of conditional independence, given C, and given C c / we 'have 
P(AB) - P(A|c)P(b|c)P(C) + P(A|C C )P W (B|C C )P(C C ). * ^ 
, ,In the case 6f conditional, independence, given C, but conditional non- 
Independence, given C c t *we have 



P(AB). - P(A|c)P(B|OP(C) + P(AB|c')P(C c a, * 

!•*•"* 

, In either case, ve have ' 

P(A)P(B) - pWc>P&|CJP^c;^^|c c >P<b|c c >P 2 <C c > 

+ [p (A|C)P (B [cjy+'p (A| C c )P (B | C )] P (C)P C 

o '" - ' ; 



Only in unusual case* would we have P(AB) . P(A)P(B). An example is 
provided in Problem B-5 . 



/ 



B2-1 



2. Some patterns of probable inference 

c 

We now consider a commonly encountered pattern of probable inference. 
We begin by giving two examples, then lifting out the essential pattern. 
When the appropriate conditional independence is identified, we show how 
it may help in determining the desired posterior odds. 



Exam ple B2-a 

Associated with a certain disease are* several symptoms. The presence 
of the symptoms does not guarantee the* presence of the disease, but with 
high probability they occur when the disease is experienced and do not oc- 
cur when the disease is absent. The symptoms are observed by chemical tests 
of blood samples. The tests themselves are not conclusive, but have high < 
probability of detecting the presence or absence of the symptoms correctly. 
Now the chemical tests respond only to appropriate conditions in the blood 



and are not influenced by how the patient feeXs or otherwise responds to 
his condition. Let H event the patient t)as the disease, D * event the 

■ . y 

symptoms occur (in the blood condition)-, and R » event the tests indicate 
the presence of the symptoms. Since the te^js^ respond to the symptoms and 
not directly to the disease* it seems reasonable to suppose 

P(R|DH C ) and P(R|D C H)« P(r|d C H C ), so that (R,H) is conditionally inde- 

* *> c 

pendent, given D, and given D . ^ 

Bxamytfe B2-b 

A firm plans to market a new product nationally. Suppose the market 
may be characterized reasonably unambiguously as "favorable" or "unfavor- 
able". The company executives decide to check market conditions in a test 
area. Let H ■ event the* national market is favorable, D - event the- test 



B2- 



market is favorable. Past experience allows reasonable estimates o/.^ 
P(D|H) and. P(D|H C ). However, direct, completely reliable determination of 
the. condition of the test-area market would be tdme consuming, expensive, 
and would e'nWil the risk of a competitor capturing the market. A market 
survey of the test area is matie, ihe remits of such a survey are not con-' 
* elusive; but under the assumed conditions, they are affected only by the 
conditions in-the test area and not by existing conditions in tSfTnational 
market, except as the latter conditions * are reflected in the test area. .If 
R - event the survey shows the test market is favorable, we suppose that 
-PfflEW -?<R|DH C ) and P(rVh) - P( R | D C H C ). This means that (R >H J is\ 
conditionally independent, given D, and 'given D C . 

These two examples exhibit features which are typical of a variety 
of inference problems. " v 

1) There^is an objective system about which some inference is 'to be made, 
in jie first example, the objective system is the patient; in the 

^second, it is the national market. Ihe objective system^is presumed V 
to be in one of two objective stages (the patient has the disease or 
does not; the market is favorable or is not). If fi,= event the 

K>bjective_syafcem is, in one of these states, then prior odds 

c 4 

P(H)/P(H ) » a > 0 are supposed known (or are estimated). ^ 

2) Ihe objective system is not directly observable- at least at the time 

' of making tfce inference. But there is a data system which ritay be in ^ 

one of several states .(in each of t'he examples above, the data system 

is in one of two states). Each data state is "inconclusive" as to 

the objective st^ate, but there is a "probabilistic linkage" between 

the data states and the objective states expressed in term* of 

appropriate conditional probabilities, as follows. Let *D 4 « event the 

• * j 

ERIC 40 



data system is in state j (in the two -state system, we/ise p 
and D°). We suppose the conditional probabilities P(D^|H) ■ b^ > 0 
and P(Dj|h C ) * > 0 are known or may be estimated. Use*ortthe 
ratio form of Bayes f rule shows the poster ier odds to be 
PWlD^/POflDj) " P(H)P(D j |H)/P(H C )P(D j |H C ) - ab'/cj. 

3) In a typical situation, we do -not have perfect information about the** 

j 

d*ta state; rather, we have the report of an observer, or sensor . 

» 

For simplicity, we discuss a two -state data system and let R 9 event 

c 

the observer reports that D has occurred. If such a report is 

received, the effective posterior odds are P(H|R)/P(H C |R) - 

aP^R|ii)/P(R|H C ). Since the objective system is not observable, 

P(r|h> and P(r|h ) ^are usually not known. We suppose information 

is available about the reliability of the observer. That is, we . 

i 

suppose information is available to estimate . P(r|d) ■ d ihd P(r|d ) 
* with . 0 < d < 1 and 0 < e < 1. Note that "perfect information 
about the data system requires e « 0 \ (for any positive value of d) . 

4) If the objective system is not observablej^oniy the condition of the 
data^syHfeftm should affect the report. Thus, we* should have P(r|dH) 

■ P(R|DH C ) and P(r|d C H) * P(r|d C H C ). This is precisely the conditio 
th*t {R,H} is . conditionally independent , glvetv D, and given D . 
This does not imply that (R,H) is independent. 

5) Let us see how the assumption of conditiqnal independence may 'help in 
k- determining the posterior odds, given the report, » 



P(H lR) u « P(H) P(RDlH) + P.(RD |H) 

j(h c |r) ?(h c ) p(rd|h c ) +p<rd c |h c ) ^ 

m . p(d!h )p(r]dh ) + p(d c |h )p(r!d c h ) 
\ p(d|h c )p(r|dh c ) + p(d c Ih c )p(r|d c h c )" * 



B2-4 



Under the assumed conditional Independence, this becomes - 

• -Mtisuv. lisbosiLB: + pj^l afid^j 

p(h c |r) ,p(d|h c )p(r|d) + p(d c |h c )p(r|d c ) 

which. may be determined from the data, available. 
For a more general formulation of this problem,with more than two objective 
states and more than two data states, see Schum and Pfelffer [1973]. To 
Illustrate the analysis, we return to the previous examples. 
Example BZ-a (Continued) ( 
The objective system Is the patient, selected at random from among those 
who present themselves at the clinic, and H - event the patient has the" 
dlseaae In question. Suppose 10 percent of the patients examined at the 
clinic have the^ejae. Then prior odds a -*P(H)/P(H C ) - 1/9. The data 
system is the b^conditlon. Let D - event the pati^t has the symptoms 
aa-oclated with the disease. Previous clinical experience shows P(d| H ) - 0.96 
«ad P(D C |H C ) - 0.95. Let R -'event the symptoms are Indicated. -The reliability 
of the testing procedure Is auch that P(r|d) - 0.97 and P(r|d c ) - 0.01. The 
patient Is examined, « blood 'test made, and the report Is found to be posi- 
tive (i.e., event R occurs). According to the pattern above 

. / 1/9 ) 0.96 X 0.97 + 0.04 x 0.0 1 9316 
BO^lR} U.01XU.9/ + 0.95x0.01 " 5220 "* 1<78< 

One positive result of the test changes the, prior odds by a factor of >^ 
•bout 16. Tbm conditional probability that the patient "has the disease, 

given the test result is P(H|r) - 9316/5220 * 

1 1 + 9318/5220 A,u<0< *« f] • 

We extend the second example to a slightly more general situation. 
teamgleB2-b (Continued) 

Consider the test-market problem described above. Initially, company 
executives think the odd* for a favorable market are P(H)/P(H C ) - 3. 



V Past studies indicate* P(d|h) « 0.8 and P(d|h C ) * 0.2. If the test market 
v*i£ found to be favorable (event D occurs), then 

' p(h Id) _ p(d!h )p(h) „ o.8 x 3 m lT 
' p(h c |d) p(d|h-)p(p c ) 0,2 

However, direct, completely reliable checking of even the test market condi- 
tions would be time consuming, expensive, and would entail the risk of a 
Competitor capturing the market. Two market -survey firms are employed to 

V* 

survey the test market. Each makes a survey and reports its conclusion 
about the condition of the market. Let 

A * event firm "a" reports the test market is favorable 
B ■ event firm %" reports the test market is favorable 
J The companies work "independently" in such a way that the investigation 
|\ carried out by one does not affect that carried out by the other, .regardless 
*of the state of the test market. Because of the nature of the surveys, 
.the results cannot be completely reliable.* Suppose 

P(A|J» - 0.9, P(A|D C ) « 0.3, P(B|D) - 0.8, and P(B|D C ) « 0.2. 

Find the posterior odds P(H|AB)JJP(H C |AB) for a favorable market if both 
reports are favorable. 
SOLUTION. 

« 

v* Again, we are faced with the problem of "independent tests. 11 Complete 
^independence of* {A,BJ is not expected, for the outcomes of both tests 
-V'WflTrelated to the condition of the test market. However, since the 
Survey teams' work in an operationally * independent manner and neither team 
is affected by the national market except as.it influences the test taarket, 4 
/ it seems reasonable to assume that ^(AjfiD) - P(A|fi C D), P(A|HD) * P(A|H C D), 
\ J>(b|HD) «P(B|H C D), ani P(AB|HD) « P(Ab|h°D). These conditions im^ly 



that (A,B,H) is conditionally independent, .givfen D. A paralle'l 
yield* conditional independence, given' £> c . we note that'. 

P(AB|H) - P(ABD|H) + P(ABD C |H) 4p(d|H)P(AB|dH) + P(D C |H)P(AB|D C H)" 
■ P(D|H)P(A|D)P(B|D) t Pj[D c |H)P(A|D C )P(B|D C ) 
and similarly for conditioning eve/t H c . He day, therefore, write 

^(hIab^ a p(h')p(ab|h) 

P(H C |AB) P(H C )P(AB|H C ) 

J PjHI P(D|H)P(AlD)P(V|D) + PiD C |H)P(AlD C ,P(B|D^/ 
. P(H C ) P(D|H C )P<A|D)P(15]^^ 

, . 3 0.8X0.SK0.8 + 0.2><0.3<0.2 147 * 
0.2x0.9X0.8 +*o.a<o.axo.2 * T6"*~ 9,2 • 

llievalue 9.2 Is somewhat less than the odds of 12 obtained if perfect 
information were available about the test*market, as mi*ht be ejected. ^ 
If we do not have conditional independence/ the problems are still 
* meaningrull, but more detailed information is required for solution. Thus, 
we' need P(R|l»), P(r|d c H>, P<R|DH C ) , , and P( R |d C H 6 ). However, in this 
case it would be simpler to operate witlb P(r|h)' and P(R|H C ), since R 
must be treated as a datum directly related to H. The reason for not 
doing this is that the objective system is not aval/fable *or observation/ 
frit it is precisely in this situation that welnould assume that P(r]d) 
■ P(R|dH), etc., since if the objective system is not available to the 
observer, only the; condition of the data system^, affect the report. 



B3-I , * 

3. A classification problem * 

Suppose subjects- are drawn from two groups. Each subject answers a 

battery of questions, or is otherwise tested with regard to a ifet^of 

characteristics*. The result is a profile of data for each subject tested. 

Each individual is to be classified in one of two groups, on the basis of 

the test results. The problem, may be formulated iy probabilistic terms 

fs follows (see Schum and Pfeiffer [1977]). 

There are n data classes ■ 1, 2, n, one corresponding 

to eacK question or test. Let " ~ % ~ 

* event the answer tow question i falls into category (i,j) 

Then fl^ =• ( D il» D i2* ' D im^' If tlie list of P ossil> ^ e answers'or 
results is exhaustive and mutually exclusive, then &^ is a partition of 
the basic space on which probability is defined. 

J 

We suppose the subjects are drawn from, two^ mutually exclusive groups. 
We let * evenl| the individual interviewed belongs to the k th group. 
In order to make probable inferences, 'we must suppose that the'probabili- 
ties p ( 6 j c ) and PCD^Ic^) are positive and known. If we assume that 
no datum is conclusive, we must also have 0 < ^CGjJDji) < 1 fo £ a11 
permissible i, j, k. Since each jft is a partition, we have 
^Pfojjl^) - 1 for each permissible i, k. 

When an individual is interviewed, a profile is determined. A given 

profile corresponds to an event E ■ D-. Du. ... D , . The various 

p lj l Z h ^n 
possible profiles are mutually exclusive, so that events of the type E^ 

constitute a partition. We ask, "What is the inferential value of the 

compound event cqrresponding to a profile?" The usual answer is formulated 

in terms of the likelihood ratio L p ■ ^{^\o^)IV{t^\G^ or, equivalently , 

the log -likelihood ratio A p - log' L^. We'may take logarithms to any base, 



45. 



»o long as we are consistent. » 

The problem, as it stands, would seem to require that we have condi- 

• • * 

tional probabilities for each .profile, /or each of the *two groups. This 
'mucftdat* is rarely available, nor is it needed in a wek -designed experi- 
ment. In the usual experimental design, an attempt is Jde to formulate 
the questions or tests in such Vjaanner that responses or results are " 
"independent . 11 - Once more, we have the issuVvjf^ctfndi tional independence. 
The probabilities of various answers to a given question should depend 
upon the basic £haracter^Aics of the subject (hence on his or her group 
JLmbership), but shou^l^n^t depend upon his or her responses to the other 
questions . That is, a given subject's response ^to* a particular question 
should be the same whether or not the other^fcstions are asked, or regard- 
less df the order in which. they are asked. This does not' mean that the 
responses to the question^ are totally independent; the answers are , 
conditioned by the group to which the subject belong (i.e., by the char- 
acteristics common to that group), else the question^ have no diagnostic 
value. The desired independence holds within a given group, but the 
probability distributions are different in the two groups. Hence we make 
the assumption that the family {j^, & 2 flj is conditionally inde- 
pendent , given and also given G^. In this cise 

* P(D ij IV ' . , *' 

L - II L I « n ^rr r- and A » S A.. "S logL^ 1 A 

P i i h * i TO^j | G ) P i iJi i 8 ijf^ 



Wfe may carry the formalism further in a useful way by introducing the 
random variables ' 



KIC 46 



B3-3 



t 

T i*^ A i1 I D (has value 4 whenever D. . occurs). 

j k H u ifr 1 j ij ' ♦ ' 

If E P " D ljj^3^N^nj occurs ' then T " ST i has the value 
A U + A04 + A . «= A . Hence we utilire 

T ■ S T » j /Tip (has value A Nrtienever E occurs). 

Use of Bayes* theorem gives 

T ; p(G xiy ; p(E p iyp(G 1 ) p ( g.) , 

Standard practice is to classify the subject in group 1 iff T > t 

V P < G <lI E n> * ~~ C ' 

which corresponds -to > * • 

This formulation allows uS\to deal with the problem of misclassif ication 
probabilities. Qonsidef the con&itional distribution functions F^-fy) 
and F T (.|G 2 ), defined by * T (tk) « P(T < t|G k ), k - 1, 2. In the con- 
ditionally independent c^se^T^: 1 < i < n}. is an independent class with 
respect to each of tlj^jrobability measures P(- "G^ and P(.|G 2 ). The 
central limit theorem ensures that for sufficiently large n both F T (.|G ) 
and F T (. JG 2 ) are approximately normal. Examples show that the normal ap- 
proximation may be quite useful for n as small as A or 5. 

With £he conditional distributions for T, standard statistical' tech- 
niques' may* be utilired to determine the probabilities of misclassif ication 
errors. Under some conditions, better choices of the decision level t 

c 

may be madft. For a discussion of these issuea, see Schum and Pfeiffer [1977], 
Example B3-a * 

Subjects are to be classified in one of two groups. They are asked to 



47 



/ 



respond to a'battery of six questions, each of which is to be answered in 
one of three w«ys: yes, no, uncertain. To calibrate the test, a sample 
of 100 subjects is interviewed intensively to determine the proper group 
deification for each. It is found that 55 belong in group 1 and 
45 belong in group 2. If - event a subject belongs to group 1 
and G 2 - event a subject belongs to group 2, these data are taken' to 
mean* that P^) « 0.55 and P(G 2 > - 0.45. 4he response of this contittl 
ora calibration group to the questions is tabulated as follows: { ^ 



Group ,1 (55 members) 







Yes 


No 


Uncertain 


J - 




0 


1 


2 


1 - 


1 


17 


26 


12 




2 


7 


30 


18 




3 


8 


40 






4 


14 


, 31 


10 m 




5 


>ts 


25 . 


15 






9 


' 33 / * 


13 





.Group 2 (45 members ^ 




Yes 


No 


Uncertain 




0 




2 


i - 1 


30 


10 


5 


2 




» c 


2 


, 3 


29 


12 




' 4 


25 


18 




\ 5 


14 


18 / 


13 


6 


31 . 


. 7 


7 



assigned^arbitrarily^mbers 0, 1, 2 to the answers yes, no, 
nj Respectively. Thus, is the event the answer to question L 



%£ is ^^ event the answer to 'question 4 is "uncertain," etc. 



We haye 

ic|rt 

is " ^es^ D 42 is> 
We interpret the data in the tables**) mean that . P(D 1q |g 1 ) - n 1(> /55.- 17i(55 
«d P(D 42 |g 2 ) - m 42 ^45- 2/45, etc 

A subject is selected at random from 'the population from which th 
sample was taken. The Subject's, answers to the six* questions, in orde 
are:.*?**, yea, no, uncertain, no, yes. How should this subject be 
classified? * t< 

SOLUTION. . v , ' 

Thft e^nk E p - DtfV&Wl'feo "••Jjccurr^ (^calculate the value 



J3-5 



T « A <«8 follows:" * i * ♦ 

~ Aw ■ ^Wiotwvv- io « 8/ft.v 0 - 769 

*20 ' lo * 27/ft " :i ' 551 ■ - A 3I " ^ Il/lt " I - 003 



-loglfgf- 1.409 7; 5l » log f|Z||.o.l28' 



A 42 



A 60 - log if^f - -1.437 Sunning gives A p - -1.217. 

We, also find t Q - log P(G 2 )/P(G 1 ) - log 0.45/0.55 - -0.201. We thus have m 
T ■ A ■ -1.217 < -O.201 ■ t £ ; hence we classify the subject in group 2. 
To consider classification error probabilities, we could assume the con- 
ditional distributions for T, given G^ a,nd given G^, to be approxi- 
mately normal* By obtfainin^aonditlonal means and variances for the various 
T i> we could obtain the cVnitional means and variances for T, given , 
and given G^. Standard statistical method; could then be utilized. We do 
not pursue these ^matters, since our primary concern is the role of condition- 
al independence in formulating the problem. jj 

*"*" t. 

* It is not necessary that all the questions be conditionally independent. 

* 4T 

there could be some intentional redundancies} leading to conditional 

* * * 

dependencies, within each group. Suppose in the numeric aV example above 

that questions 1 and 2 were made to interlock. Then it would be 

necessary to consider this pair of questions as a single composite question 

with ntne possible answers. Frequency data would be required -on each 

pair* of answers (no, no), (no* yes) 4 (no, uncertain)*, (yes, no), *(yes, yes), 

(yea, uncertain), (uncertain, no), (uncertain, yes), (uncertain, uncertain). 

One would' still suppose conditlpnal independence for the set of questions, 



B3-6 



provided this composite question ft dealt with* as one qijestiolU ifere 
complex groupings could be made, increasing the amount of data needed to 
utilize the classification procedure, but there would be no difference in 
principle. \ 



0< 



-oLfe 'ft of 



c ERIC 



50 



B4-1 

<. * ' 

4» Problems * 

B-l Prove the equivalence of at least four of the sixteen conditions for ' 
independence o'f (A,B). c 

B-2 Complete fhi argument in Example Bl-b to show that the equality 
P"(a|bC) »P(a|b C C) implies that the common value is P(A|c). 

B-3 Establish the equivalence of at least four of the sixteen conditions 
for conditional independence of (A,B), given C, * 

B-4„ Sfcov that the condition P(a|bD C ) < P(a|b C D C ) in Example Bl-c implies 

p<a|bd c ) < p(a|d c ). 

« * 

B-5 A group of sixteen students has an equal number of .males and females. 

One fourth of the females and three fourths of the males like to play 

basketball. One half of each likes to play volleyball. A student is 

selected from the group at random, on an equally likely basis. Let 
* 

A ■ event the student likes basketball, ~~ 
. B ■ event the student likes volleyball, 
C *» event the student is male. 

Suppose (A,B) is conditionally independent, given C, and conditionally 
c * 

independent/-gJLven C . Show that (*»B) is independent and ^{B,Cj ' 
. is independent, but (A,B,C) is noVindependent . 
B-6* In Example B2-b, show that the conditions i) P(A|BD) - P(A|B C D), 

ii) P(A|W • P(A|H C D)t, iii) P(b|hD) • P(B|H C D), .and iv) P(AB|HD)* V 
■ P(AB|H C D) together imply that (A,B,H) is conditionally independent, 
given D. 

B-7 In Example B2*b, determine £(h| AB°)/P(H C |aB C ), the conditional* 

V 

odds, given conflicting reports 6f "favorable" by "a" and 'Unfavorable" 
b y "b"- * , ' ' * v ^ 



51 



/ 



B«8 i Consider the following problem, stated in a manner common in the 
literature. A patient is given a tes\^or a type of cancer. The 
probability of a false positive is 0.10. The probability of a false 
v negative is 0.20. One percent of the tested population is known to 

have the disease. If a patient receives two independent tests, and 
both are positive, find the probability the patient has cancer. 

a) Let C« event the person selected has the given type of cancer 

- event the first test indicates cancer (is positive), 
T * event the second test indicates cancer. 
Discuss the reasonableness of the assumptions that (T^T^J is 
conditionally independent, given C, and is conditionally 
independent, given G C . © t , 

b) Under these assumptions, determine P^JTjT^). ' 

c c ) Under these assumptions, determine P(clT T C ) 

12 

» B-9 A student decides to determine the odds on the forthcoming 4 football 

game with State University. v The odds depend heavily on whether State's 
star quarterback, recently injured, will flay. A couple of phone calls 
yield two opinions whether the quarterback will play. Eactt' report 
depends only on facts related to the condition of the quarterback and 
not on the outcome of the game (which is not known, of course}. The 
two advisers have operated quite independently in arriving at their 
estimates. The student proceed? as follows. He lets ' 
W ■ event the 4 home team wins the game, 

Q ■ event the jptar quarterback plays for §tate, 

h 1 

A - event t£e first informant is of the opinion he will play, 

B - event the second informant is of the opinion he will play. 

The stuSent (having studied Example B2-b) decides to assume (W,A,B) 

ia cofldioionaUy independent; given Q, and conditionally independent, 

*!&■• 52 ^ - 



B4-3 \ 

( given Q°. On the basis of past experience he assesses the N re liability 
of his advisers and assumes the following probabilities: F(A|Q) ■ 
P(A C |Q°) - 0.8, P(B|Q) - 0.6, and P(B C |q C ) * 0.7. Initially, he 
could only assume P(Q) - P(Q C ) - 1/2. Expert opinion assigns the 
odds P(W|Q)/P(W C |Q) - 1/3 and P<-W|Q C )/P(W C iQ C ) =%8/2. .On the basis 
of these assumptions, determine the o*lds P(w| AB C )/P(W C | AB C ) and the 
probabi lity P (w| AB°)\ 



B-10 A student is picked* at random from a large freshman class in calculus. 
Let 1 
T * event the student had a previous trigonometry course, 
A » e^ent the student made grade "A" on the first examination, 
B * event the student made grade "B" or better in the course. 
Data on the class indicate that 
P(T) » 0.60* P(A|T) ■ O.fcO p(a|t c ) » 0.30 
P(B|AT) » P(B|A) « 0.60 P(B|A C T C i - P(B|A C ) * 0.30. 

^The student selected^ made "B" or better. What is the" probability 
P(^|b) that the student Jiad a previous course in trigonometry? 
b/v Show that (T,B) is not an independent pair. 
B-ll Experience shows that 20 percent of the i'tems produced on a production 

0 

line are defective with respect to surface hardness. An inspection 
procedure has probability 0.1 of giving a false positive and probability 
0.2 of giving a false negative. Units which fail to pass inspection 
are given a corrective treatment which has probability 0.95 of 

correcting any defective units and jjero probability of producing any 

y' 

adverse effects <jn the Essential properties of the units treated. 

-v i » ^ 

However, with probability 0.3, the retreated units* take on a character- 
, *> \ » 

istic color, regardless of whether or not they are defective (initially 



B4-4 

\ * y 

or> finally}. Let " ' 

* D 1 » event the unit selected is defective initially • • 

c ' 
I « event the unit failed inspection = event unit is retreated 

^ D ■ event the unit is defective after retreatment 

C -'event the unit is Hiscolore^ j£ter retreatment 

a) Show that it is reasonable to suppose that (CjDj) ig conditionally 
independent, given" I C , and that {C,D 0 } is conditionally inde- 

- ^ pendent, given I 6 ^. [Note that IC » 0 and P(D^D 2 > » 0.] 

b) Determine P(D |c), the probability that a unit is defective, 
* , given that it is discolored. 

m-12 In the classification problem, Example B3-a, determine the appropriate 
classif icrftion if the answers to the six questions are: yes, no, no, < 
uncertain, yes, no, respectively. 



4 



1 V * 



... 



c : 



C. QbnditiQrial Expectation 



V 



C. CONDITIONAL EXPECTATION ' **" « ° * 
1. Conditioning by an E*am? . C M 
Z Conditioning by a Random V»j£or-Spacial Casts 8 C2 t 

a Conditioning by a Random Vactor-Gantral Caaa C3-1 

4. Propartits of Conditional Expectation P4-1 

a Conditional Distribution* , ; CB ; 1 . 

a Conditional Distributions and Bayat' Thaoram ? CM 

7. Proofs of Propartias of Conditional Expectation • C7-1 

a\ PfoWanis £ * CO- 1 ] 

. »' . ' °, 

». - .5 

a > «r ' 



C Conditipnal expectation ^ 
In order to introduce and develop the second concept of conditional- 
independence, we ^need to examine the concept of conditional expectation. 
The usual introductory treatment of conditional expectation is. intuitive, 
straightforward, but severely limited in scope, tore general treatments 
tend to assume familiarity with advanced notions of measurability'and 
abstract integration theory* We'sej* to bridge the gap and make the. appro'- 
priate aspects of a general treatment more ^readily accessible. 
1. Conditioning by an event . - ' ' 

^ IC a conditioning e#ent C occurs, we modify our probabilities by 
introducing the conditional probability measure «P('jc). Thus, P(A) is - 
replaced by P(A|c) » P(AC)/P(C). In making thfs change^, we do two things: 
i) We limit the possible outcomes to those in event C -* 

ii) # ^We 'formalize 0 the probability mass in' C to make it the new unit 
* or mass . • 4 

It seems- reasonable to make a corresponding modification of mathematical' 
expectation, which we view as a probability weighted average of the 
values taken on by a random variable. Two possibilities are apparent, 
a) We could modify fche-prioi* prpbability measure to the conditional 

XL 

probability, measure P(.|c), -then tafce expectation .(i.e.-, weighted 
average) with respect to this new^probability mass assignment* ; 
!>) We could ,continue to- use the original probability measure ^and 
modify our averaging process as follows: . \„ ■ <» 
I) For a real random variable* X^ we consider the value X(co) -for 
only those « in the event C. We do this 'by utilizing. the 
random variable I C X, ^ich has the value X(tt>) . for m in C, 
and has the value 'zero ,for any u> outside C. Then E[l X] is 



Cl-2 

i » 

the probability weighted sum of the values taken on by X in 
the event C. • 
it) We° divide the weighted sum by P(C) to obtain the weighted 
average . • 
As shown by Theorem Cl-i; below, these two approaches are equivalent. 
For reasons which will become more apparent in subsequent developments, 
<we*take the second approach as the basis for definition. For one things 
we can db the •summing' 1 U each case with the prior probability measure, 



then obtain the average by dividing by P(C) for the particular condition 
ing event. This approach facilitates relating the present concepjt_to the * 
more general concept of conditional expectation, given a random vector* 
'wfiicfi is developed in the next two sections 

DEFINITiqgMf the event** C has positive probability and indicator 
^function I c ,' the conditional expectation of X, given C, is Xhe 
quantify E[x|cJ » E[l X]/P(C)., 

Several properties may be established easily. 
theorem Clr-1 n * 

a) E[x|c] is expectation with respect to the conditional probability 

D " * * * *• * 

measure P(-jC) * 

% * 4 * 

• b) eJi a |c] « P(A|C) 

c) If C^«y C t (disjoinjt union), then E[x|c]p(C) * 2 EfxIc^Pff ) . 
PROOF OF a\ e 
If X is^ a simple random variable 2 1. 1 , then 

B»|cJ E[I C X]/P(C) » E[S t k l c l A J/P(C) « 2 t k K[l^ c ]/?(C\ 

' m k t k P(A k' C) * E C M ? *" ' V 

wherettne symbol E c [-J indicates' expectation with respect to the condi- 
tional probability measure P(-|c). . 



If tX > 0, then there is a sequence {X : 1 < n) of simple random' 

y n T 

variables increasing to X. This ensures that the sequence (* c x n : \ $ n ) 

is a sequence of simple random variables increasing to IqX. By definition! 

EflJC]/P(C) * lim E[l_X ]/P(C) ana" E„[x] * lim E„[x ]. 
C n G n C n C n 

Since E[l c X n ]p(C) ■ E^xj for each n, the fimits must be" the same. 

Jn- the general case, we consider X * X - X , with both X > 0 and 

/ •*" + • ♦ 

/X^>0; By linearity, 

/ * • * * . 

C / E[l c X]/P(C)^ E[l c X + ]/P(C) - E[l c Xj/P(C) = E C (X + ] - E C (XJ = E C [X]. 

^Propositions b) and c) are established easily from properties of 

m athematl pal_expectatlp. n. ^ 



The* following theorem provides a link between the present concept and 
the more^general concept developed in the next two sections. 
Theorem ."CI -2 • *■ - 1 . - 

\ ■ ■ • • > 

If event C Y (M) = {Y € M), for any Borel set M, has positive 

probability, then E[l M (Y)g(X)] - E[g(X) | Y € ^P (Y € M). 

n * \ 

PROOF. 

By Theorem Al-1*, VJX) ■ I„- By definition E[l„g(X)] = Etg(X) (,C]P(C) . * 
Hence, E[l M (Y)g(X)] » E[g(X) | Y € M]P(Y 6 M) . -jj 

l£ should' be noted that bptfi X and Y can be vector-valued. The function 
g must be real -valued, and M is any Borel set on the codomain^of Y. 



C2-1 



2 * Condition ing by a random vector— special cases 

In^this section, we consider two simple, but important, cases of 

conditional expectation, given a random vector. We make an intuitiye ' 

approach, based on the idea of a conditional distribution*. In each case, 

the conditional expectation is found to be of the form E[g(X)|Y » u] - 

e(u), where e (.) is a Borel function defined on the range of Y. This 

function satisfies, in each case, a fundamental equation which provides 

a tie^th the^concept of conditional expectation, given an event, and . 

which sferves as the basis far a number of important properties, ttiis 

funda mental equation also provides the basis for .extending 'the^noept of 

conditional\^eT^a^n7^ven a random vector, to the general case. 

Casei) X,Y^**e^te. X* Z t I • and Y ■ Z u t , where 

i»l 1 A i j«l J B j 

A i - fa: X((u) * tl ) and h } » {o>: Y(o>) « Uj ). We suppose P(A ) > 0 

and p(Bj)>0 for each permissible i;j. Now 

E[g(X)|Y -UjJPCY- Uj ) - E[g(X)|B j ]p(B j ) 

- E[g(X)r ] 



j 



E[g(X)I (Y)] 

I? l ^ 8 l t i >1 (u j } (U k )P XY< t i'\> 
« ^( t i>P XY ( t i^ j > ; 



by def. 
by Thm Al-1 



since I (u ^j(u k ) - 1 
iff j « k. 



If we consider the conditional probability mass function % 

P Y j v (t. U 4 ) * , J - m L. . i_ 

i J 

t we may write * • * 

E[g(X)|Y - Uj ] P (Y - Uj )-- [ £ gU^^t^)^) 
from which we get " 
E[g(X)|Y - Uj ] - Z gC^PxI^tJuj) - e(Uj ) for each u, in the range of Y. 



ERIC 



60 



C2-2 



Wfejaag le£ e(«) be any continuous function which takes on the prescribed 
values e ( u j) f ° r e *ch u^ in the range of Y. Then e(-)« is a Borel 
function. Suppose M is any Borel set on the co domain of Y. Then 
E[I M (Y)8(X)] - Z 'SCWVP^W 

" l ^< t i> I M ( VPx|Y (t tK>MV 

* PM (U k ) ^ 8(t 'i ) Px|Y (t ilV ] PY ( \> 

■ ^VV^kW =Etl„(Y) e (Y)]. , 

/ 

Hence, e(-) must satisfy * 

E[l M (Y)g(X)] ~ E[l M (Y)e(Y)] V Borel set M in the codomain of * 

The uniqueness property E7) for expectations ensures e(«) is unique 

> 

«'S« [Py] » which in this case means e(-) is uniquely determined on the 
range of Y. q 

. Example C2-a « 
' Suppose X, Y produce the joint distribution shown in Fig. C2-1. De- 
termine the function e(-) » E[X|Y *» •]. 
SOLUTION, 

From, the joint ^distribution, we obtain the quantities^ 

P Y d) - P Y (2) - 3/10 P Y (37» 4/10 

P X(Y (1|1)*P X ,;(2|1) -P X|Y (3|^) - - 1/3 

P X | Y (4|1) -P x |^(5|l) -0, ^ 

Hence eCl) - 1/3(1 + 2,+ 3) - 2. ' * 1 

Similarly e(2) - 1/3(2 + 3 + 4) - 3 and e(3) 1/4(2 + 3+4+5) - 7/2 
Graphical interpretation. The conditional probabilities P x | Y (k|u), for 
fixed v, are proportional to the probability masses on the horizontal line 



corresponding to Y - u. Thus, e[x|y - u] is £he center of mass^for 
that part of the joint distribution wh£ch corresponds to Y » u. jj 

Case ii) X, Y a$e absolutely continuous," with joint density function f 

' XY" 

Since the event {Y » u) has zero probability, we cannot begin with 

conditional expectation, given the event [Y ■ u).* We may utilize the 

, a 

intuitive notion of a conditional distribution, given Y = u, by employing 
the following device. Let 



f X |Y (t 



|u) -{ 



■ f XY (t,u)/f Y (u) for f Y (u) > 0 
0 otherwise. 



-£ox- fixed u HBuch-that — f^(u ) (i . ev 7 -in th e rang^ofyTT, ~t"he~Tunc tT6nT 

f x | Y (-|") has the properties of a "density function: f^C^u) >0 and 
J* f x j Y (t|u) dt » 1. It is natural to call this the conditional density 
function for X, given Y - u. ' In part, the terminoiofc? is justified by 
the following development. Let M be any Borel set on the cpdomain of Y. 

Then / * *' 

* > 

. E[g(X)I M (Y)] - JJ g(t)I M (u)f xV (t,u) dtdu 

• I y^tf g(t)f X | Y (t| U ) *dtj f Y (u> du 

J* I M (u)e(u)f Y (u) du » E(l M (Y)e<Y)]- ^ 
where' e(u)^ J* g(t)f x | Y (t|u)ylt. 
Now e(«) must satisfy 

E[l M (Y)g(X)] - E[l M (Y>e(Y)] V Borel sets M in the codomain of V. 
It seems natural to call e(u) the conditional expectation of g(X), K given 
y • u. In the case P(Y £ M) > 0, we have by Theorem CI -2 ' 

fi[i M (Y)e(Y)] - J I M (u)e(u)f Y (u) du - E[g(X)jY 6 M]P(Y 6 M). 
If e^O is Borel, as it will be in any practical case* property E7) for 
expectation ensures that e(Y) is a.s. unique, or e(-) is unique a.s.'fpj, 
which means that it is determined essentially on the range of Y. 



C2-4 



Example C2-b * ' * 

1 Suppose X, Y produce a joint distribution! which is uniform oyer; 

the triangular region with vertices 0 (0,0)^ (1,0), (0,1), as shown in 

Fig.. C2-2. Now . * 

1-u 



f Y (u) - J f^t.u) dt " 2 r dt « 2(1 - u) 0 < u < I % 
f i ^(t,u) - for 0 < t ^ 1 - u, 0 < u < 1 (and zero elsewhere). 



Hence * 

1-u 



XC^.-lEfxtY^-: u J »-f-t :^ttr|u)-dr I t a • L y iL o < u< 1. 



1 - u 

u ^ u 

' V . 

Graphical interpretation . The dashed line in Fig, C2-2 is the graph of 

•A «t - 

e(u) vs« u« This coula have been anticipated by the following graphical ,5*1' 

interpretation. If *f^y * continuous, we may visualize , f^|^(t|u) as 

proportional to the mass per unit length in a very narrow strip .on v «the plane 

about the line corresponding* to Y « u., E[x|y « -u] is the center of* mass 

of th*e portion of the joint distribution lying in that narrdw: strip. 

* * * 

... . .. X • , 



• 64 



C3-1 



3. Conditioning by a random vector — general case 

The treatment oSv^he special cases in the previous section begins 
with the notion of a conditional distribution. While this approach is 
intuitively appealing, and quite adequate' for the simplest cases, ft 
quickly becomes unmanageable in nfcre general cases which involve random 
vectors of higher dimensions with mixed distributions. We seek a more 



\ 

satisfactory approach 



We base our development on a simple property derived in each of the 
two special cases considered in the previous section. In each case, the 

ant lty called the conditional expectation W g(X), given, Y = u, is 
the value e(u) of a Borel function e(«) which is defined'on the range 
of Y, The random variable e(Y) satisfies . , 

A) E[l M (Y)g<X)] « E[l M (Y)e(Y)] Y Borel sets M in the codomain of Y„ 
By the uniqueness property E6) for mathematical expectation, e(Y) /xaist 
be a,s, unique, which is equivalent to the condition e(.) is unique, 
a,s, [Py] • By Theorem Cl-2 on conditional expectation, given an event, 
we have 

? — /-- 

B) If PJY € M) > 0, then E[l M <Y)e(Y)] - E[g(X)|Y €jMY 6 M) 

Motivated by these developments, we make the , 

EEfTNlTIpN, Let e(») be a real-valued, Borel function defined on a 

set which includes the range of random vector/ Y. Then the quantity 

e(u) is „the conditional expectation of g^X), given Y « u, denoted 

E[g(X)|Y - u]* iff 

A) E[l M (Y)g(X)] - E[l M (Y)e(Y)] >fA brel sets M in the codomain of Y, 
♦ / * ^ 

Associated with the Borel function e(0 is the random variable e(Y), 

Now e(') is unique* a. s. {jA* and e(Y) is unique a. s. • 



Go 




£3-2 



DEFINITION. The random variable e(Y) is .called the conditional 
expectation of g(X),|iyen Y, denoted E[g(X)|Y]. 

Note that we oust distinguish between the two symbol: ' ' 

a) E[g(X) |Y * •] » e(«)i a Borel function on the range of Y 

b) E[g(X)|Y] - e(Y) a random Variable-- for a given w we write 

E[g(X)|Y](«,). 
\ » 

Example C3>-a , ^ 

* • f 

If the conditioning random vector Y is simple, an explicit representation 

m 

of e(Y) » Efg(X) | Y]- is obtained .easily. Suppose Y« lu.L (in 

J-l J B J 

• * 

canonical form-- see Sec Al), so that B. ■ fY ■ u.l and I„ ■ I, ,(Y), 

If e(u) ^E[g(X)|Y « u] , then e(.) is defined for u^ in the range 

t>f . Y by e( Uj ) » E[g(X)| Y =» Uj ] « E[l (u j(Y)g(X)] /P(Y - Uj ) (conditional 

expectation, given the event (Y ■ u )). Hence, 

\ m yta * 

feQO - Z e(u.)I «= Z Efg(X)|Y » u.]l, .(Y), 

Thus, when the conditioning random vector is simple, so that P(Y » u^) > 0, 
the concepts of conditional expectation, given the event (Y ■ u j}> and- 
'of conditional expectation, given Y * u^ , coincide for u^ in the range 
of Y, and the same symbol is used for both. Use of formula B),* r above > 
gives 

E[g(j^|Y € M]P(Y ^M),« E[l^(Y)eCY)] f p 

„ pB[g(X)|Y - u j]Etl M (Y)I, {u p(Y)], 

The quantity^ Efl M (Y)I (u ^(Y)] - P(Y - u^) iff Uj € M, and is zero 

otherwise, [] * % 



.66 



C3-3 



Example C3-b 

Consider the random variables X, Y in Example C2-b. Let M be the 

s 

semi -infinite interval (^», 0.5], so that (Y € M)« (Y < 0.5}. Then 
0.5 , v 

P(Y 6 M) - J f (u) du * 3/4 [May be obtained geometrically.] 
A 0 5 

E{l M V (Y) e (Y)] - jf e (u)f y (u) du = J°' 5 (1 - u) 2 du y/24. 
^lence . 

E[x|y< 0.5] - (7/24)/(3/4) *-7/18. fJ 

In each of the two special cases considered "in Sec C2, W have been 



able to produce a Borel function e(.) which satisfies the defining relation 
A) for conditional expectation; The uniqueness property E6) shows 
e(-) to be unique a.s. {p^ . In .See C4, we state a number of properties 
of conditional expectation which provide the basis for much of its usefulness.* 
In Sec C7, we provide ^proofs of these properties based on proposition A) 
and properties El) through E6) for expectation. These properties hold . * 
whenever the appropriate Borel function e(-), exists. ^Thus , they hold 
for the two special cases examined in Sec C2 and for otners* which can be 
derived similarly. It would be convenient i£ we Knew the"<£nditions under 
which suitable e(0 exists.' As a matter offset, if we utilize the 
powerful existence theorem E10> ^fpr mathematical expectation, stated without 
proof in Sec A2, we may assert the existence of e(«) for any random 
vectors X, Y and any "real -valued Borel function such that E[g(X)] 

is finite.^ The properties obtained in Sec. C7 then hold in any such case. 



C4-1 

Properties of conditional expectation , 

ction, we list-the principal' pro^ett y i,es of conditional 
expectation, givten a random vector, which are utilized in subsequent ~" 
developments. PrXofs are given in Sec C7^These are based on the defining 
relation A) and /properties El) through E6) for mathematical expectation. 

o * y / 

In the fo>nowing/ we suppose, without* repeated assertion, that the 
random vectors and Borel functions, are such that ( the existence of 
ordinary expectations is assured. 

st of properties with the defining condition. 
CEl)~eTY) - Etg(X)|Y] a.TTTfnil M <Y)e<Y)j - EU M <Y)g<X)J foTaTl - 

Borel sets M in the codomain of Y. 
As noted in relation B), in Sec C3, % 

CEla) If P(Y € M) > 0, then E[l M <Y)e(Y)] » E[g(X)|Y^ M]p(Y 6 M), 

If, in CE1), we let M be the entire codomain of Y, so that I W (Y) has 
t M 

the^constant value one for all to, we obtain the important special case 

CElb) E[g(X)] E{E[g(X)|Y]}. 
y The device of first conditioning by a random vector Y and then taking 

expectations is often useful, both in applications and in theoretical 

developments. As a simple illustration of the process, we continue an 
9 earlier example. * r 

Example C4»a (Continuation of Example C2-b) 

•Consider, again, the random variables X,Y which'producwe a joint distri- 
bution which is uniform over the triangular region with vertices <(f,0), 
(1,0), (0,1). It is shown in- Example C2-b that 
* fy(u) ■ 2(1 7 u) for '0 < u < 1 ' (and zero elsewhere) 
V e(u) - EtxjY m u ] - i-|JL for 0 < u < I. * 



ERLC ° 



By C&W - * 

♦ EM - E[e(Y)] - J e(u)f y (u) du - jj (1 - u) 2 du » 1/3 . < , ' ' 

The result could, of course, have been obtained by finding *f x (t) » 

-?(l~=~t) "for" 0< t < I and -calculating " ' ' ' 

EM * j P tf x (t) dt m^Z jj (t -V) dt - 1/3. . ^ 

The choice of approach depends upond the' objectives and the information 
_ at hand. n * * 

The next three properties emphasize the integral character of conditional 
expectation, since they are in direct parallel with basic properties of 
expectations^ or integral One 4 must be aware, of course, that for condi-* 
'tionat expectation the properties ^ ft|y fai^l to hold on an exceptional set 
of outcomes whose probability is zero. The proofs given in Sec C7 show 
how these properties are, in fact, based on corresponding properties of 
mathematical expectation. « 

CE2) Linearity .. E[ag(X) + hh(Y)fz1 *aE[g(X)|z] +bE[h(Y)|z] a.s. Cwith 

extension by mathematical induction to £nv finite linear combination, ) 
CE3) Poaitivity ; monotonlclty . - 

g(X)>0 a.s f implies E[gjX)|y] > 0 a.s. , ? 

g(tf) >h(Y) a. s/ ^implies E[g(X)|zj > E[UY)|z] a.s. 
4E4) Efinotone convergence . x n "* x a - 8 « oonotonically implies 

E I* n M -*E[x|y] a. a. monotonically ' > * 

Independence of random vector* is associated with a lack of' "conditioning" 
in, the following sense* 

CE5) Independence * ' The pair {X,Y} is independent iffi 

E[g(X)|Y] - E[g(X)] a.s. for all Borel functions ^g such that 
E[g(X)] is* finite, iffi J 

E[I n (X)|y] * E[l N (X)] a. s. for all Borel sets N on the codbmain of X 



VC4-3 



Note that i\ ifi insufficient t^at * E[g(X)|v] « E[g(X)]. a.s. for one* 
specific Borfcl function g. It is relatively easy to establish counter- 
.examples (see Problem C-5). 

Use of linearity, monotone convergence, and approximation of Borel 
functions by step Junctions (simple functions) yieTds an extension of CE1). 

CE6) e(Y) - E[g(X)|Y] a.s. iff E[h(Y)g#)] » E[h(*)e(Y)] for all Borel 

* * f 

functions h such that the expectations exist* 

The next t^ree properties exhibit distinctive features of conditional 
expectation which are the basis of mucj^ of their utility. Proofs rest on 
previously established properties of mathematical expectation, especially 
part a) of We employ these properties repeatedly id subsequent 

developments* ^ » 

CB7) If X- h(Y), 'then 'Bfg(X)|Y] - g(X) a.s. 
CE8) 6fh(Y)g(X)|Y] - h(Y)E[g(X)U] a.s. 

tJE9) If % Y-- h(W), then E{E[g<X)| Y] |^} * E{E[g(X) | w] | Y] - E[g(X)|Y] a.s. 

It occurs frequently that Y 'is a random vector whose*coo<dinates fonft a 

, + * 

subset of the coordinates of W. Thus, we may consider W ■ (Y,Z), which „ 

implies jY is *a Borel function of W, so that <> 

CE9a) E{E[g(X)(.Y]fY > Z.J - E{E[g(X)| Y,z] | Y} * E[g(X)|Y] a.s. 
« 

If^the function h in CE9) has a Borel inverse, then W » h -1 (Y), so 
that, the roles of > Y and W are interchangeable. Thus, we may assert 
CE9b) If V Y V h (W)» where h. is Borel witH aJBorel inverse, 

then E[gpO|Y] - E[g(X)jw] a.s. 
We note two special cases of CE9b). If the^coordinates of Y are obtained 
ki a permutation of the Coordinates of W, then Y - h(W), where h is' 
one -one, onto, and continuous, hence Borel with .Borel inverse. Thus,* . 
conditioning by a random vector doe's not depend upon the particular ordering 
of the coordinates. If we have a pair of random vectors {X,Y) which do 




70 ' <" 



C4-4- 

not share any coordinates, then conditioning by the pair is understood as 
'conditioning by th^random vector (X,Y) whose coordinates consist of 
the combined set of coordinates of the two random vectors (in any order). 
In a similar manner, we can consider two random vectors which, may have 
some coordinates in connon* Conditioning by such a pair is understood as 
conditioning by a random vector whose coordinates consist o£ the combined 
set of distinct coordinates. For example, Suppose X - (XjjX^jX^) and ' 
Y - (X 1 ,? 3> X^). Then conditioning by X,Y is conditioning by W ■ 

* ^ X x ;, ^*^3 ,X 4^ * It: is a PP arent these ideas extend to larger combinations ' 

of vectors, « • * * v 

» 1 » 

The next result is so plausible that it is frequently taken to be self 
evident. Although it is easily established in certain" simple cases, it is 
somewhat difficult to establish in the general case, noted in Sec C7, 
It is extremely useful in the Borel function form, as follows. 
CE10) Suppose <g i£ a Borel function such that E[g(X,v)J is f inite {for_ ^ 

.all v in the range of Y' and E[g(X,Y)]^ is' finite. Then ' 
. # E[g(X,Y)!Y - u] - E[g(X,u)lY - u] a,s, [p y l . 
- In t(ie independent case, CE10) takes a useful- farm, 
CEll) If the pair /^I^Y} in CE10) is 'independent, then^ 

* E[g(X,Y)|Y - u] i E[g(X,u)] a.s. A 

* s / s • 

, Among the Inequalities for expectations which can be extended to 

conditional expectations, the foHowing are useful in many applications. * 

CE12) Triangle inequality ,' | E[g(X) | Y] | < e[ |g(X) \ \ xi a.s, x 

CE13) JenSen's inequality . If g is-a^ponvex function on an interval L 

* which contains the range of real random variable X, the 
- * g(E(x|y])< E[g(X)|Y] a?s. - ¥ | 

; .-ft - * < . . 

E« ttblishment of inequalities for conditional expectation (as for< expectation) 

ERIC , • 



C4-5 



depends upon setting up the appropriate inequalities .for random variables, 
then utilizing monotonicity CE3). The inequalities on the random variables 
are often expressions of classical inequalities in ordinary analysis. As 
in the case of expectations, monotone convergence plays a key role \n 
establishing analogs of Fatou's lemma, dominated convergence, and countable 
additivity. 



/ 

/ 



V 



\ 



72 ' 



, 0 
ERIC 



. **5. Conditional distributions , # 4 

The introductory treatment of the special cases of conditional expecta- 
tion in Sec C2 utilizes the notion of conditional distribution. In Sec C3, 
however, we disregard this notion in developing the general concept of condi- 
tional expectation, given a random vector. In the present section, we show 
/ that conditional probability and conditional distributions can be treated 

i , 

as special cases of conditional expectation. ~ 
By properties CElb) and Ela), 
E[I M (Y)E[l N OC)jY] } » J^b H <X>|Y = u] dF y (u) 

- e[i n (*)|y* Mjp(Y € M) - P(X € N|Y € »)P(Y € M). 
This leads naturally to the 
Z ^ DEFINITION. P(X € N|Y - u) « E[i n <X||yTu] a.s. 

If X is real -valued and N » (-», t], then we sat 
\ ' * ♦ < 

• F x|Y (t|u) " P(X ^ U) ' E[I N (X) ' Y = u] a 'V 

. for each fixed t, this defines, a Borel function of u with properties 
vfrichViggest that for each fixed *u in the'range of Y the* function 
F x | Y (-|tft should be a distribution funcjj*n. One property of interest^ 

is the following. y h \ ' 

\/ ' * \ V 

P(X < t,Y € M) - Efl N (X)I M (Y)] - ffIjJ(Y)Efl N <X)|y] } 
t t 

• \ ■IiT F x|Y (t l u . ) * d V u) 

^from which it follows as a special case that 
4 F x (t) *g(Ety (X)| Y]} f J F x j Y (t|u)\dF Y (u). 

This lasC'equality is often known as the law of total probability, since 

fc if appears as a generalization of a rule knqwrt by .that name, 

. P(A> ■ E P(A|B.)P(B ), where Ad B**J 
i i i * m i i 

* n-^ ^ , - 

* Ihe material in this section is not neeBed iii the subsequent sections 

anjl may be omitted without loss of continuity. 



C5-2 

There are SQme technical difficulties in dealing with -F Y | V ('| U ) as 
a distribution function. These arise because for .each real t there 
IB an exceptional set of u of ^P y measure zero. That is F x | y (t|u) 

** - P(X < 1 1 Y » u) a,s. [P^ . Since there is an uncountable infinity of 
real numbers t>, certain problems "can arise with which we are not equipped 

, to deal. In the case of joint density functions. or of jointly discrete • 
random variables, the motivating treatmeqEof Sec C2 indicates the problem 
may be solved. For a real*random variable X, a distribution, function 
is determined by its values on the rationals, which involves only a 
countable infinity of values. Thus, it is known that for real random' " 
variable X and any random vector Y there is a regular conditional 
distribution function , given Y, with the properties 
1) f x |y*'! u ^ is a distribution function for a.e. u [p^ , 

• 2) For each real t, A| v <t|u> * P(X < t|Y - u) for a.e. u [P^ ,* 
3) E[g(X)lY *u] * I g(t) dF x j Y (t|u) for a.e, u [p^ . 
In some cases, for a.e. fixed u, F x | y (.|u) is dif ferentiable and 
the function f x | Y H u ) defined by 

" d *A ' ~~ 

• "dt P x|Y (t ' tt ^ 

IS a conditional density , function fof X, given Y » u. Ihls agrees 

with the conditional density function introduced in Sec C2. 

As an important example of the use of these ideas, consider the 

problem of determining the distribution for the sum Z ■ X + Y of two 

» 

random variables X,Y. l£ we let Q * {(t,u): t + u < v) (see Fig,, C5-1), 

then > 

* » j 

F z (v) - P<X + H < v) - P[(X,Y) € Q] - E[l Q (k,V)] - E[E[I q (X,Y)4 Y] ) 
- J E[I (X^i)Iy « u] dF (u) 1>y CE10) . 



C5-3 



For each, fixed v, (t,u) € Q^ff ' t + u < v iff t < v - u iff 
.t € H v-u . Hence, I (X,u) - I (X), so, that 

E[l Q (X,u)|Y «u] « P x | Y (v-u|u). ,P > * 

» * v 

If F^| Y Is a regular conditional distribution, then 

F z (v) " J f x|y (v ■ u l u) dF Y (u)> v ; 

Ifc {X,Y} is an independent pair, then F x | y (v - u|u) « F x (v - u) a.s.. [P 
so that. 

F 2 (v)^ F x (v - u) dF y (u). • 
This last combination is known as ^he convolution of F x with , F . 



76 



C6-1 

» * * 

6.. Conditional distributions and Bayes 1 theorem 

We suppose a regular conditional distribution has been determined. 

It frequently is necessary to reverse the conditioning, as in Bayes theorem 

for events, In the following we treat X, Y as real&yalued. Extensions 

to the vector -valued cases are immediate. 

a) If both X,Y> are discrete, there is no problem. If we let 
P x|Y (t i' U j^ " P(X " C i' Y " u p> 311(1 similarly for the other cases, then 

p vlx<"jK> - X|Y IJs Y 1 , 



x 1 

b) If there is a joint density function, then by definition 
f Y , v (t|u)f v (u) 

c) Suppose is discrete and Y is absolutely continuous 



f Y|X <u l t > * ' f v (t) for f X (t) > °» 



' F Y|x (u l t i ) * P(Y S U I X * V 

- Etl N (Y)I (t } m]/E[l lt j(X)] 

E{^ (Y)E[l (t } (X)|y]) ' 
" EfELl,. ,(X)|YJJ • b y CEl > 

* _ P(X « tjY - v)f Y (v) dv 

JP(X - tjY « v)f y (v) dv" " 
Differentiation by u gives 

rtX - t |Y - u)f y (u) * f 

<r\x*}h>* - K», ti? Y * 

Simple algebraic manipulation gives* 



i f Y | x (u t )P(X - t yj 
POC-tjY-tt) ■ Y ' X ^ i — for f Y (u)>0. 



ERJC 



77 



C7-1 

7. Proofs of the properties of conditional expectation 

In this section, we show that if e(») is a Borel function which 

satisfies the defining relation A) E[l M (Y)e(Y)] « E[l u (Y)g(X)) for 

any Borel set M on the codomain of Y, then properties CE2) through 

CE13) hold for e(Y) - E[g(X)|Y] . Note that when we write e(Y) - E{g(X)(Y], 

we are asserting that e(-) satisfies the defining relation (A) and must 

therefore be unique a.s. iPy) . 

In the proofs, we employ the properties El) through « E6) 0 f 
* <» 
mathematical expectation. Actually, we need only the simpler part a) 

of property E6). Note that the proofs do not involve the complexities 

of conditional distributions. The reader who^wlshes to go through the 

proofs carefully may wish to use the summary of properties of mathematical 

expectation in Appendix I. A* tally of the «use of these properties might 

be* instructive. To simplify writing, we drop the "a*s." in many places. 

9 / 

At several places, the arguments require an acquaintance with 
measure •theoretic ideas beyond thfft assumed of most readers. In these 
instances, we sketch the ideas of the proofs, in order to indicate to 
the interested reader what to look for in peeking a more complete treat- 
ment. The goal is insight into the mathematical structure as, an aid to 
interpretation and application. 
CE2) Linearity. ' 

Let e x (Z) - E[g(X)|z], e 2 (Z) - E[h(Y)|zJ, . e(Z) « E[ag(X) + bh(Y)|zJ. 

For any IJoreL set M in the codomain of Z, we have, <^f^ 

E{I M (Z)[ag(xV+ bh(Y)] ) - E[l M (ZMZ)] * by CE1), 

Also 

E{I M (Z)[ag<X) + bh(Y)J ) - aE[l M (Z)g(X)] + bE[l M <Z)h(Y)J by E2) 

» aE[l M (Z) ei (Z)] + bE[l M (Z)e 2 (p] by CE1) 
• E(I M (Z)[a ei (Z) + be 2 (Z)] ) . by E2). 




7» 



C7-2 



By E6), we have e(Z) » ae^Z) + te 2 (Z) a.s.,, jj 
* Cfe3). ' Positivity ; mono tonicity . « 

g(X)>0 a.s. implies R[l M (Y)g(X)] > 0 * . by E3) 

implies Eh M (Y)e(Y)] >0 by CE1). 

Suppose «e(Y) < 0 for co € A. Then there is a Borel set ^ with 
\ m y l %). Thus, I (Y)e(V) - I A e(Y) < 0. By E3), we have 
*f\^ )eW] -°' ^ equality iff I A e(?)»0 a; 8 .. But this requires 
,P(A) » 0, wftich fs~€qu£valent to the condition e(Y) > 0 a.s. 
Monotonicity follows from positivity and linearity.- n 
CE4) Monotone convergence . - * 

'Consider the nondecreasing case X R t X a.s.^ Put e n (Y) - E[x |y] and 
. e(Y) - E[X|Y]. The*by fcE3), e n (Y) < e^ff) < e(Y) a.s., all n >\. 
The almost-sure restrfcbion means that we 1 can neglect an event (set Jf oj) 
of zero probability and have the indicated relationship for all other u>. *" 
By ordinary^ rules of limits, for any a; other than the exceptional set, we * 
we have e*(Y) - lim e n (Y) < e(Y), which means the inequalities hold "a.s. 
For any Borel set H K I M <Y)X n t I M (Y)X "a.s., and I M <Y)e n <Y> t I M <Y)e*<Y) a.s. 
so that by monotone convergence for expectation, 

| B f I M <Y)e n<«J * tfyWftU E[I M (Y)X] -E[l M (Y)e(Y)] and * 
E[l M (Y)e n (Y)] t E[l M (Y)e*(Y)J. Hence, - ' * 
Etl M x (Y) e *(Y)] - E[i M (Y) e (Y)I for all Borel sets M on the codomain of . . 

■> • • 

Y. This ensures e*(Y) - e(Y) a.s., by E6). ^ 

CES) Independence , a) {X,V} is independent ■ iff b) E[l (X)|y] - 

E[l N (X)J a.s. for all Borel N 'iff c) e[ 8 (X)|y] - E [g(X)] a.s. 
for all Borffl functions g> 



.. ■ - ) 




73 



v 



C7-3 

a) * c) (g(X), I M (Y)} is independent; hence 

E[l M (Y)g(X)] • Efl M (Y)]E[g(X)] ^ by K5) 

« E{E[g(X)]l M (Y)} (E[g(X)] a constant) - by E2) 

E[l^Y)g,(X)] - E[l M (Y)e(Y)] . • by C E1). 

Since the constant E[g(X)] is a Borel function of Y, we conclude* by E6) 
that e(Y) - E[g(X)] a.s. 

c) o b^, since V) is a special case ? 

b) => a) For any Borel sets M, N on th^e codomains of X, Y, respectively, S 
E[l M (X)I N (Y)] - E(I N (Y)Efl M (X)] } A » by hyp. and CE1) 

- E[l M (X)]E[l N (Y)] by E2) 

which ensures independence of [X,Y} by E5). "* ri » 

( * 

CE6) Extensio^j^ CEl) to general BortY functions.* 

First we suppose g > 0/ By positivity CE3),*wehave e(Y) > 0 a.s. $ 

1) By CEl), the proposition is true> for h - I u , 7 

M • / 

2) By linearity CE2), the proposition is true for any simple function \ , 



m 



i 



h - Z t I \ . * , * 

i-1 1 n i 

3) For h > 0, there is a sequence of simpleVfunctions h t h. This 

\ n * 

implies h (Y)g(X) t h(Y)g(X) and \(Y)e(Y) T h(Y)e\v>* a.s. Hence, 
by monotone convergence E4), for expectations, 

E[h n (Y)g(X)] T E{h(Y)g(X)] and Efh n (Y)e(Y)]* t E[h(Y)e(Y)] # 
^ Since for each n, E[h n <Y)g(X)]^» E[h (Y)e(Y)] , the limits must be 
the same. 6 * 

4) For general Borel h, we have h ■ h + - h_, where both h^ and h 
are nonnegative Borel functions. By linearity and 3), we have 

E[h(Y)g(X)] - E[h f (Y)g(X)] -£[h_C0g(X)] 

- Efh + (Y^)e(Y)] - E[h_(Y)e^)J* - E[h(Y)e(Y)] . 



f ^ . C7-4 

5) For general Borel g,' >we have g » g^« - g_, where both g + and g 
are nonnegative Borel functions. By linearity and 4), we have 
E[h(Y)g(X)] - E[h(Y)g + (X)] - E[h(Y)g_(X)] 

» E[h(Y)e + (Y)] - E[h(Y)e_(Y)] » E[h (Y)e(Y)] , 
where e + (Y) = *[g + (X) | y] , ejY) « E[g_(X)|Y], and 
e(Y) - e + (Y) - e ; (Y) a.s., by CE2). jj 

CB7) IfJC-h(Y), then E[g(X)|Y] » g(X) 'a.s. 

g(X) " g[h(Y)] - h*(Y), with h* Borel. For any Borel set M, 
E[l M (Y)g(X)] » E[l M (Y)h*(Y)] » E[l M (Y)e(Y)J . by CEl). 

But this ensures t 
h*(Y) « e(Y)^a.s. by E6)# 

CE8) E[h(Y)g(X)|Y] » h(Y)E[g(X)|Y] a.s. 

For any Borel set M, I M (Y^h(Y) is a Borel function of Y. Set 
e(Y) - E[g(X)|Yj and e*(Y) * E[h(Y)g(X) | Y] . 

NoH E[l M (Y)h(Y)g(X)] « N E[l M (Y)h(Y)e(Y)] by CE6) 

and E[l M (Y)h(Y)g(X)l - E[l^(Y)e*TYyf / by CEl). 

Hence, h(Y)e(Y) - e*(Y) ' 'by E$). j] 

CE9) If Y - h(W), then E{E[g(X) |y] | W) - EfBfcflf) I w] I Y) « E[ 8 (X)|y] a.s. 

Set e(Y) - e[ 8 (X)|y} - e[h(W)] » h*(W) and /*(W) « E[g(X)|w]. 

Then, EfE[g(X) |y] | W) - E[h*(W)|w] - h*(W) « e(fc by CE7), 

For any Borel set M on the codomain of Y, let\ N - fT-(M). By Theorem' 
Al-1, I M (Y)-I N (W). Repeated use of CEl) gives 

'E[l M (Y)g(X)] - E[l M (Y)e(Y)] 

« E[l^(W)g(X)] - E[l N (W)e*(W)] - E[l M (Y)e*(W)] 

^- E{l M (Y)H[e*(W)|Y] \ 

Hence e(Y) - E[e*(W)M a.s. by E6). n 



C7-5 



XY* 



Proof of CE10) requires some results of measure theory beyond the ' 

scope of the present work. We establish the proposition |irs,t for the 

special case that X, Y ' are real -valued, 0 with joint density function f 

then we sketch the ideas of a general proof. 

C?10) Btg(X,Y)|Y - uj - E[g(X,u)\* - u] ^a.s. [p^ ~ , y 

^ROQP FOR SPECIAL CASE. X,Y have *J 0 int* density f * § ^ 

• XY " 

Let f^ y be defined as in Sec C 2 / ^ut e(u,v) -Etg(X,v)| Y*« u] and > ? 
e*(«) -E[g(X,Y)|v-u]. Aen E[l M (Yjg(X,i)J «*E[y Y)e*(Y)] V Borel 

set M. This is e^oivalent to * ' m » ' , \ 

* rr " * v ■ > * \ 

, JJ I M (u) 8 (t ' u)f X Y (t: ' u) dtdu * J du. 

The left-hand integral may be written 
« 

fc H *)lJ g(t,u)f x|Y< t|u) dt]f Y (u) duV I" I M (u)e(u,u)f Y / (^ du." 

»», ' . - j ■ ; : '\ t \*,. 

J" I M (u)e(u,u)f Y (u) du >= J I M (u)e*(u)f Y (uJ da . or, equivalent^ 
E[l M (Y)e(Y,Y)] - E(l M (Y)e*(Y)] V BoreT Sep M. * 
We conclude e(Y,Y) - e*(Y) a.s. ^ ? ^ u }- ; 

IDEA OP A GENERAL PROOF *" A • - 

If the theorem can be established' for g(t,u) - I (t,«r, where Q t ' Jh l^' 
any Borel set on the codomain of (X,Y), then a 'jstandard, argument" - 5 
such as used in the proof of CE6) extends the theorem to any "Borel ) ' 
'function g.^such that E[g(X,Y)] is finite. 

We -first consider Borel sets, of the form Q - M X^N, where M, N are* ' 



^Jorel sets in the codomains of X, Y, respectively. Then t (t,u) - 

f*t e<u>\ , B[g(X,v)^- u] - E[l M (X)i N (v)|v « u] - I H (v)B[l M (X)| Y - u] 
apd e*(ui ' - E(g(X,Y)|Y -*u] - eI^j(X)I^(Y)| Y - u] . 

Ho W ^(Y,Y)-I N (Y)B[yX)|Y] and e*(Y) - I h <Y)E[I m 0C)|y] a.s. by CE8). 
'Hence, ^tf.Y) e*(Y)| a.s. or e(u,u) e*(u) a.s. [p^ . 

ERIC 8 ^ ' * . - 



C7-6 




By linearity, the equality holds .for % aVy Borel set Q -#ftf% x N since 
« , . n ^ A* i»l»* l* 

in* this case ^ « E !„ I N . Hence, equality holds for the class 

4 * • • i*l i i * , M 

consisting of all finite, disjoint ^unions of sets of the form ; 
' number of arguments may be use* to show that equality holds for -all Bor^ 

sets Q. We sketchione proof. ^It is know* that the class H Q ' i' s a fielS 

and that the Mnimal sigma field which Contains it is the class 8 of 
? Borel sets. Let 4 8* be the class oi sets^for whfch equality holds. o If 

{Q i : 1 < \) is a monotone class of sets i* ft*, % ?«tne sequence (i : U< i) 

is*a»nonot*ne*seq>nce„of Borer functions. Use'of monotone convergence 

-V * / q * ^ 

E43jd^r expeceationajshows that equality vholds for I where 0 is 

% the limit of the sequence Q £ ;-ttius, 8* is a monotone class. a- 

•well' known theorem, 8* must'include^ fiT^is me.ro that equality ho'lds 

for every Borel set Q. n , " c 

CE11) if;. (j^Y) 1* independent, , Btg(X;Y)|Y »u) - E(g(X,u)J.i.8, fpj , 
By CE5), independence of {X,Y}^ ensures • e(u,v> = E[ 8 (X,v)|y - u] - 
EfgOt.v)] .a.s. pj, so e*(u) » e(u,u) = E[g(X,u» ' a.s. frj. , 
C?^} Triangle inequality . ' * * 

.Since g(X>< (g(X)4,'„e,have E[g(X)|Y] <E[fg(X)| |y] a.s. by CE3). 
• Since -g( X )< | g( X)|, we have . E[g(x)jY] < B[ j g(x) ,J Y] a-s _ by CE3)) ^ 
"Hence, |b[ S CX)1y}U< E{] S qc)||Y] a.s. f , ' 
CE13) Jensen.' b inajnuality. ^ , 

Convex function g satisfies g(t> >g(y) + X(y)<t - «here X., is' 
v a no'ndecreasing function. » Set . e(Yj - E[xfY]. Then 

800 > g[e(Y)] * X [e(Y>] [x - e(Y>] . Ifwe^take conditional expedition, 
E[g(X)|Y] >E{g/e(Y)J +X[e(Y)][X - e(Y>] |Y) a.s. ' " by CE3) 
. -*{g[e(V))|Y) + E{X[e(Y)]xfY) - E(X[e(Y^ f (Y)|y* by CE2) 
.* • # -gte'W] + X.[e<Y)Je(Y) -X(e(Y))eW " /• . by CE7), CE8) 

-g{E[x|Y]} a.s.' fJ " 

. • . S3 ' 




C8-1 

8* Problems 



C-l Prove parts. 'bX ^nd c) of Theorem CI -1. 

9 ' 

* C-2 Suppose X, Y haVe Joint density function f" Use Theorem Cl-2 
and property Ela) for (expectation to show that 



a) If ' P(Y € M)> 0, t^en 



. • /BfgGC^Y 6 Id -; M [jVt)f XY (t,uX^dt] du //Jj f XY ( t , u) dt ] du . 

"•• b) *If "P(X € N) > 0, then 

o 

, E[g(xy|x e'ST^^odco dt / ; N f x ( t ) dt 

q-3 Show Btg(X)fA] -Efg(X)|AB]p(B|A) +E[g(X)|AB C ]p(B C |A). 
£-4 If X is discrete and Y is absolutely^continuous, then the joint 
-distribution can be described by a hybrid mass -density 'function *f 
j ^ such that PCX «- ti , Y £ M) -/ M f^.u) du. Develop an expression 
( | Jor e(u) - Efg(X)'|Y « tfj * in^is* case. 

C-5 /# Let X- 01^ + 21 ^ + 31^ and Y - 1^ + 3 I fi * (canonical form), 
' \ with joint probability' distribution such that j^B^^l^ , * 

P(A 2 B 2 >' » and H*^\) « 1/3/ Shpw that E[x|y = l] » lT[x[x - 3l 

but that {X,Y) is not independent. I * 0 * * * * 
-C^ _Sfiow that for X jea-I, the ( triangle inequality is a special case 

of Jensen's* inequality. 4 • 

C-7 Suppose .'{X,YJ is- independent, and ea^Sh random variable is uniform 
^ on [-1, f]. Let Z « g(X,Y) be given by, * ' ' ° 

(X^for X 2 + Y 2 < 1 • 
c for X 2 + Y 2 , > JL. \ ' ' * 

'". Detenaine E[Z|X 2 + Y 2 <{] and E [z|x 2 + Y 2 >. l] . 'use these results 

to determine B fzj . , 
C-8 X,Y have joint densUy function ^0^*) « J- tu for 1* < t < u < 2 

(and zerp elsewhere). Determine * ; 

a) * Etx 2 i+ Y 2 |X -<t] *) E[XY|X - f\ c) E [xjx < iff + 1)] ' v 



C8 r 2 

C-9 The pair {X,Y} Us* independent, with f (t) « f (t) » 1/2 for 
1 t * 

-1 < t < 0, * j e" for 0 < t (and. zero elsewhere)." Determine 

a) E[X 2 '+.Y 2 |x » t] b) E[XY|X - t], 

C-XO Use the fact ttjat g(X,Y) - g*(X,Y,Z), with g* . Borel if g is, 
to establish the following extension of CE1Q), 
'EfgCX^lY -Upi-v] ■ E[g(X,u)|Y » u, Z - v] a.s. [P^]. 
C-Il Use CE9a) and the"result of problem C-10 to show that if F Y j z is 
a regular conditional distribution function, then 

E[g£X,Y)|z * v] - J E[g(X,u)|Y - u, Z * v] dF Y j z (u|v) a.s. [P^. 
C-12 Suppose X is a real random ^variable with E[x 2 ] finite. -Let 

eCY) * E[xJy] and ^(Y) = Var[x|Y]*= E{[x*- e(Y)], 2 |Y). Show that 
a} vjY) - E[X^Y] - E*(x|y] = E[X 2 |yJ*- e 2 (Y) ] 

b) Var[e(Y)] - E[e?(Y)] - E 2 [xL= 'E(E 2 {£ - e[xJ|y}) 

m jc) VarM . - E[v.(Y)] + Varfe(Y)] » E{Vaf [x| Y] ) + Var{E[x| Y] ) . 
C-13 The following, is a model for the demand of a random number of 
"customers", who independently but with the same individual 



/ demar 



demand probabilities . ( Suppose 



l > 1 13 lld {independent, i<fen£ieally .distributed) , 

k yith E{x£] finite (individual demands). 

ii>) N Is a nonnegative, integer -valued random*5/ariable with 

EfN 2 ] fi^^e (number ,of customers ) . ^ 

iii) (N, X^: 1 < k}^ is an- independent class. 

L\ / ( . for ft « 0 s 

* D * In . ' • (composite demand). 

; I Y 



X. » Y for, N - n > 0 
* k-1 K n 

* • If A*- {w: N(«>) - n), then D - Z l k Y - ,(N)Y 
„ . v . * , * ri-0 A n n <i«0, n : 

<| • • -| " 

- , , a) * Show E{d|n » fa] «'riE[x] ' and Var[D|N - n] - nVar[x] • . % 

v V ?Jfote.< E[p|N - n] « Etl {n j(N)D]/P(N^*i^ etc.. ' ^ % 

r • % ^ * *»* " 

^ eric . • . • ;• ".85. .',. :> ii* 



C8-3 



^ x b) show b[dJ « zMeJu. \ 

*J* t * c) Use the result of problem C-12 to show 

Var[n] « E[N]Var[x] + VaHN]E 2 [x], ^ 
dX. Suppose N is Po^sson (X) and X- is uniform on if^a}^^ \ 
Calculate E[d] .and Var[D] z " «i 



C-14 The characteristic function ^ for a real ranSom variable X is 

defined for all real u (i ■ re the complex imaginary 
unit,^ i » -1). the generating function .g| for a nonnega'ti^e, 

integer-valued random variable N is Z^W- = Efs**] = ' T s k P(N « k), 

k=0 • 

defined at least for |sj < 1, although possibly for a mucn larger 

domain. ' It is readily shown that addition of a finite number of 

* • • / * ' 

•members of an independent' p lass of 'random yari & les/Corresponds to 

multiplying their characteristic functions (or their generating 

? functions, if they exist). Consider the composite' random variable 

D in problemT C-13. • * 

ay Show that ©^(u) » g^cfyCu)] , where g N is the generating 

*3g|^ nctlon /o? n a°d i cc^ is tihe common characteristic function 

for the [Suggestion^ tradition by N, f then take expec- j& 

1 " tation.* E[e* uD |fl - n] - E [e ioY nJ J. ' / ' ■ 

» b) Show that if the are nonnegative, integer-Valued with conmon' 

generating function then- g^s) - g N f« x (s)] . " " , 

c) Suppose' N is Poisson a)., Sno\_thajt gjjGff)'? expfHs - l)]',** 

-* _sp that 'o D (u^ - dcpp.f^u) - if}. , ... ' ' 

[P(N - k) - e" X fa- J. ' . • 



,86- 




^ - C^IS The correlation riitid^ of X with respect to Y (see Renyi' [1970] , 
p is K[X|Y] - a[e(Y)]/<y[x] (see Problem C-tt). Show 

« that the following properties hold: 

£ v «). o < k[x|y] < i ' « 

b) If (X,YJ is independent, then 'K[x|y] - 0, 
* c) # K[X|^] «/ 1 iff there is a Borel function g with- X - g<Y). 

d) K 2 [x|y] • sup p 2 [X, g(Y)], where g ranges over the 'set of Borel 

^ - * v * * , r 2 i " ' ' 

w functions such 0 that. E[g (Y)J is finite and p JLs the correlation 
coefficient for X -.and g(Y). 

. • * . • 

e) K 2 [x|y] - p 2 [X, g(YJl iff there exist a,«b > 6) su*ch * \* 

\£ .**'■>' ' * r 

- : ^ " that -g(Y) - ae{Y) + ; b a'.s. "* 

" * * * ' i 

\ Suggestion. For ct), e), use CElb) and Schwarz 1 inequality. Wbrk 

2 • h , «■ , v 1 ' •• . - 

,. v with standardized random /variables obtained .by subtracting the mean 

and dividing by the standard deviatlpn*. < ** * 

\'< C-16 Suppose* £ and g are Borel functions such that E[f(X)] -»*E[g(Y)] » 0 

* * Var[f(Xj] - Var[g<Y)] - 1, and E[-f(X)£C»] -.sup E[^{X)t(Y)] »'x.. 

: Snow that 

% a) Etf(X)|Yj -X*g(Y) ,a,s. .and E[g(Y)|x] - X^GQ- a.s.<~ \ 

' hi R(B[f - (X)'| Y]|XJ - X 2 f(X) "a.s. and tf{E[g(Y)|x] | Y}> » X 2 g(Y) a. s. 

. - c) fc[f(X)|g(Y)] « X$(Y) a.s. -and^ E[g(Y)|f(X)J - Xf(X) a.s. 

# ' ."A 

Suggestion, use CElb) and" Schwarz inequality. 



■ * 



n 



O. Conditional Independence, 
Given a Random Vector 



88 ~. 



1 



I 



D. CONDITIONAL INDEPENDENCE, GIVEN A RANDOM VECTOR 

1. The Concept and Some Basic Projjertits DM 

2. Some^Iements of Bayesian Analysis D2-1 

3. A One-Stage Bayesian Decisional Model 

4. .A Dynamic-Programming Example 

5. Proofs of the Basic Properties 



a Problems 



D3-1 
D4-1 
D6-1 
D6-1 



> j 



■X 



, 3 



3 .. 

ERIC v : 



'89 



Dt Conditional independence, given a random vector 

The concept of conditional independence of randoiAectors which we 

r ( t 

consider in the following sections has been utilized widely in advanced 

treatments o»f Markov processes., tn such treatments, the concept is usually 

expressed in terms of conditional expectation, given a sigma field of events. 

t • 

Our immediate goal is to reformulate the essential features of such treat - 

\, 

ments in terms of the more readily accessible conditional expectation, given 
a random vector, developed in the preceding sections. We then illustrate 
**the usefulness of the conditional independence concept by showing how it 
appears naturally in certain problems in decision theory. .In Sec El and 
following, we apply the concept to Markov processes. 
1- The concept arid some basic propositions " * * 

Although historically there seems to be no connection, it may be instruc- 
tive to consider how the concept of conditional independence, given a random 
vector, may be seen £6 an extension of the simpler concept of conditional 

*» 

independence^ given an event. Suppose (A,B^ is conditionally independent, 

-1^.1 
" given C, with r A » X> (M) and B - Y (N). Then the product rule 

P(AB|<p » P(A|C)PXB|C) may be expressed B[l M «)I N <Y)| C] - E[l N (X) |c] B[l (Y)| C] 

If this rule holds for all Borel sets M, N on the codpmains of X, Y, 

respectively, we should be inclined to say (X,Y) is conditionally independent, 

given C. ^ Suppose 2 is a simple, random variable, with C « [Z - z fc }. 

Then, in these terms,, we would say {X,Y} is conditionally independent, 

given. Z - z . If the product rule holds' for all ^Borel sets M, N in 

the codomains of X, Y, respectively, and for all ^ k in the* range of Z, \ 
• » . * * * • 

r , we should t*nen be inclined to say the pair -fX,Y} is ^ondf tionaily-*ndepen- ~* 

4*nfc, given ~Z. With the*«aid o* the, result of example C3-a, we may give 

/this sep^bT^condittp^ns a ^simple" formulation which points to a general 



erjc s 90 



* definition. We have. 

zli^wi^mlz) - z b^ooi^u - z k ]i c 

with similar expressions for E[l M (X)|lJ-' ana** E[l N (Y)|z). Using the, facts 
thaf I 1 - o' for j k and I* « l" , we obtain 

J ^ * . c k °k - . : 

E[I M (X)|Z]E[I N (Y)|Z] - 2E[I M (X)|Z - z k ]Ell N (Y)Iz « . ^ * 

We thus have • * " 

i) % e[i m (X)i n (y)|z] - E[l^(X)Jz]Etl N (Y)[z] iff *, . v 

ii) Bll^OOljjOOl* - z k ]-!« E[I M (X)|<Z « s^Btl^mlz -,z k ] for*ll*k. , 

* • " * *„ 

We have seen above that the set of conditions* ii) is a reasonable basis - " 
, for the notion that (X,Y) is conditionally independent, given simple ; 
random^ vector Z. This development suggests the simpler equivalent expres- 
sion i) may be the mrire useful way to characterize the cortdktion.i Further 

~* • * * * • << . -/* " 

evidence is provided by^the following aet of equivalent conditions -(i*e* » 

Sec D5 for proofs)* ->\ ''- ' 

For any rarfdom vector Z, the following conditions are equivalents * "\- 

4 . . % ^ * . 

Ctt) ^I M (XJI N (Y)|Z] - E[I M (X)|Z]E[I N (Y)|Z] a.'s. V Borel sets % H~, N 

'a\8. V Borel sets M ' * 

<Z3) E[I M (X)I^Z)|Z,Y] - Efl M (X)I Q XZ)|z] a.s. > Borel sets M, Q 

CI4> E[l M (X)I Q (Z)|Y} -;E{P[I(X)I (Z)|zJ|y) .a.s, V Borel sett H, Q 

CIS) E[g(X)h(Y)|zJ - E[g(X)|zjETh(Y)|z] a.s. V Borel functions g, h' 

CI6) *E[g(X)|z,Y] - E[g(X)|z] a.s. V Borel functions g 8k ' 

CI7) B[gi[X,Z)|z,Y] « E[g(X,Z)[z] a.s. V Borel functions g' - '* 

CI8) E^[g(X,Z)|Y]^- E{E[g(X£)|z]|Y} a. s., V Borel functions g. " 



* - , . - Dl-3 

• ' - vfc- . * , ^ - ♦ 

'Several facts should be rpted. for one thing, properties CI5) through 

• CI8) are generalizations of CI1) through CIA), respectively, in that the 
indicator functions are rep laced, b? , real -valued Borel functions, subject 

1 only to the* restriction that the resultant random variables. .g(X),.h(Y), 

• and g(X,Y) should have -finite 'expectations. It is desirable' to have 

i , 

the properties CI1) through CI4) included in the list of equivalences to 
-show that it is sufficient' to establish - one. o'f these simpler conditions 
in order to be able to assert the apparently more general counterparts 

CI5) through, CI8), respectively. * '. 

> »- 

* . Expressions CI2) and CI6) show that if X is conditioned by 2. 
further conditioning by y has no apjyecialjle effect. We thus have an 
analogy to the idea that the event pair [A, B] * is conditionally independent, 

then further condition- 
oocurrence of A. Express- 
ions CI3) and CI7) generalize this t9 say that if (X,Z) is conditioned - ' 
^by/z, further conditioning by Y has no apprftciaole effect. It is clear 
that she role of X and Y, could be in^rchanged* in these statements. 
Conditions CI4) and CI8) have fto counterpart in the theory 0 f conditional ' 

. Independence of events. They do, however 5 , *play an important role in the 

. * i . > 

-theory of the new concept; and they include as a special^fcas'e the « 

* * & • 

Chapman -Kolmogorov equation which plays a prominent* roie in the theory 

, of Markov processes (cf See E45. * , 
** 

These considerations indicate that we have identified a potentially . 
useful concept which is properly nam^ conditional independence. We can V 
use any of the eight equivalent propositions as the basis for definition. 



analogy to the idea that the event pair [A,B] * is con< 
given event C, If once A is conditioned by C, ther 
ing by,. £ has no effect on the livelihood of the oocui 



As in the case of independence of events* and of raodom variables, W e use 



the product rule CI1). 
* 



ERIC; • - $2 



« 

DEFINITION. Hie pair of random vectors (X,Y) is conditionally 

Independent , given Z, iff the pro'duct rule CI1) holds, 

• * # 

An arbitrary class of random vectors is conditionally' independent, given 
Z, . iff an analogous product rule holds for eacbp finite subclass of two 
Qr more members of the class. 

I*f the pair {X,Y} is conditipnally independent, given Z, we should ' 
expect that any Borel functions of these two«variables should be conditionally 
independent. This is the case. * ' 

t 

CI9) If {X,Y} is conditionally independent, given Z, U * h(X), and " 
V » k(Y), with h, k Borel, then {U,V} is conditionally inde- 
pendent, given Z. ' • 1*JH * 

For convenience of reference, we. list several additional properties of 

conditional independence utilized 'in various subsequent developments. 

CI10) If the p^Ur. {X,Y} is conditionally independent, given Z, then 

a) E[g(X)h(Y)] - E{E[g(X)|z]E^h(Y)|z] } - E[e f (Z)e (Z)] , and 

b) E[g(X)|Y OJPCX € N) * E{E[X N (Y)|z]E[g(X)jz] } 

CflO IJf (Y* (X,Z)} Ys independent, .then fx,Y} is conditionally 

independent, given , , ♦ * 

CI12) If {X,Y}* Ls conditionally independent, given Z, then 

. E[g(X,Y) g |Y - u, Z « v] * EtgOC.ujiz « v]\ a.V [P^] ~ • 

Proofs of these propositions are provided in Sec D5. * 



0 



D2-1 

2. Some elements of Bayesian analysis 

Classical statistics postulates a population distribution to be deter- 
mined by sampling^or some other appropriate form of experimentation, Typ-~ 
ically, the distribution is supposed to belong to a specified class (e.g., 
normal, exponential, binomial, Poisson, etc.) which is characterized by 
certain parameters. A finite set of parameters can be viewed* as a set of 
coordinates for a 'single vector-valued parari&er. The value 9 of the 
parameter is assumed fixed , bV is, unknoWnV Hence, there is uncertainty * 
about it* value. ** 

An alternative formulation results ^om modeling the uncertainty in 
a probabilistic manner. The Uncertain value of the parameter's viewed as* 
the value of a random vector ; i.e. , 9 = H(of) . Ihe value H(<o) of the " 
parameter random vector H reflects the state of nature . If X is a " 
random variable^cfipre^enting the population, then the distribution for X 
is determined by the value 9 = H((o) of the parameter random vector. To 
carry out statistical analysis, we must characterize appropriately tjie- 



joint^distribution for the pair (X,H). This is usually done by assuming 

a conditional ^distribution for X, giveji H, ^presented by conditional 

distribution function F X | H (or an appropriate alternative); and by 

utilizing any information about the probable values of the parameter to 

determine a prior dlstributio^ or H, represented fay' a distribution 

function Fjj (or some ^appropriate alternative*). w 

c A central notion of classical statistics is a random sample of size ^n. 

*Sowe sampling a<it, or survey, or experiment done repeatedly, 'in 

suph avay that the outcome of one sampling act does not affect operation- 
* . 

ally the outcome of any other/ This J.s modeled as a class (X. ,X oJ 

* » 1 2 n*^ 

of independent? random variables, each having the population distribution. • 

% ; 

S4 





2-2 ~ , , 

c 

Each sampling act corresponds to the observation of one of the random 
variables in the sample* A random sample is a finite case of an arbitrary 
iid (independent, identically distributed) class (X^ i £J}. - * 

Under the new point of view, the appropriate assumption seems 
x that the class (X^ : i € J) ,is conditionally independent, given 
all random variables having the same conditional distributio^, 

H* Under a given state of nature, the result pf taking an observation of 

• - * 

X^ > does not affect and is not affected by the results of observing any 
combination of .Che other random variables. We find it convenient to adopt 
ttie. following terminology: 

DEFINITION. A class [X^ i 6 J} "is ciid, given H, iff the class is 
conditionally independent, and each random variable has the same 

conditional distribution, given H. A random sample ' (of size n), given H, 
is a finite class {X : 1 < i < n) which is ciid, given. H, 
Let us see what this means for the conditional distribution functions. 
To simplify writing, put W « (X^X^ . . . , X n ) and let I fc * I N , where 



( 



t 



,N t - (- t] Then 

Vl'^' — ^"^l^l* X 2^V «kt < Stl« - «J , 

- Ell (X.)I. (X>;.. I (X n )|H «u] 

* C i 1 C 2 1 if n 

• r n 

« n e[I £ (X^)|h « u] by conditional independence 



i-1 

n 



Thus, the conditional distribution function obeys the product rule. .Partial 

differentiation by the t^ shows that the conditional density, when it exists, 

also satisfies the product rule 1 

n 



/ 



£ W |H<VV •••.«>> * 1 U t f x|H (t ll u >- 



95. 



ERIC , . ' *i 



'5 " • 




Bernoulli trials , _given H . We illustrate these ideas by considering the 



Important special case df Bernoulli^-trials. A sequence of "identical"^ 
* * 

• i 
£s performed in an* operationally independent manner. Let E « event of a 

1 

•"success" on the 'ith ^rial ia the sequence, and set X » I , so that 

1 E i 

^ has thfe property that it Takes on the value 1 if occurs and 

Jakes on the value 0 if E ± fails to occur (E^ occurs). On a given 
sequence* of trials, the probability p of success on a trial doe^s not 
vary with i. Now p is a parameter, representing a state of nature. We 
a model it as the value ofc a parameter random variable H wi^h the interval 
^[0,lj as its rjanges fov*+i given value^ of H, the results of the various 
£rials are conditionally independent. Thus, .we assume [X^ : l <i) is 
ciid, given H. Let Ij be the indicator function for the set {0) 
and similarly for 1^. We suppose \ 

P(EjH - u) « Pj^ * l|H » u)'- E[l ()LJ (X i )|H - u] - u 0 < ufe l 
P(e£|h « u) » P(X 1 » 0|H * u)J E[X {0J (X t )|H » u] - 1 - u , 
These assumptions ensure 

EfxjH « u] » eju) « u a.s.\[P H ], 

It is conv^rtient in this case to say the Sequence is Bernoulli , given 
H * u. Td see how analysis of such sequences relates to analysis of ordinary * 
Bernoulli sequences, suppose, for example, we observe the sequence E.E^E^. 
Ohen * * * 9 a ' k ' ^ 

ni$£\\*~ u) - E[l {1) ( Xl )I fij(X 2 )I {0) (X 3 )lH « u] 

" E[l (i)< x i>l| ■ u]e[i {0) (x 2 )|h - u]e[i (0) (x 3 )|h1- u] 

- u(l - u)(l - u). . ~ ' ' 

The product after the second, equality sign is a result of conditional inde- 
pendence. The pattern here is obviously the same as in' the analysis of 

f* ■ 
ordinary Bernoulli trials, except tl^at we write u for p. Jo obtain the N 

' * *■ 

conditional probability of any such sequence, given H » u,^ include a 



fci& - '. • •' ■ 96' 



1 y . ■ 

-4 - j * - ■ 

factor u for e%ch uncomplemented E i and a factor 1 - u , for each ' 
complemented B^. * 

The random variable = Xj + X 2 + . . . + X r counts the number of 
"successes 11 in the first V trials of' a sequence. In the ordinary case, . 
S n has the binomial distribution with parameters ( p> n ). .As tfie discussion 
above indicates, if the sequence is Bernoulli, given H * u„ we simply 
replace p by u in the analysis of the ordinary case to obtain * „ 

P(S n = k|H * u) = C(n,k)u k (l - u) n " k 0 < u < 1 . * 

r 

We stm have the problem of determining the distribution of H (i.e., 
the prior distribution wUh which *to begin analysis),. Partly because of * 
ell -known integral formula 

/J u r (l - u)* c!u * (r ^ L) ; •= l/(r + s + l)C(r+s,r). 

A commonly employed class of distributions is the class o*f 
Beta distribution , . " 

^ead random variable H has the Beta distribution with parameters 

U+l, b+iy.(a* b e nonnegative integers).iff it has the 'density function 

i(a + b + 1)! a,, v b ' 
A — a ; b ; t (i - C ) D 0 < t < l 
"0 ' ' otherwise. 

* Note that jj: a - b * 0, the distribution is uniform on [0,l]. 

4 ? ■ ' ' * 

Straightforward calculations show th$t f has a maximum at *t « a/(a + b), and 

: *W*T*&*' 'va* W (a + l)(b + l) 

.« WJ+b + 2)^(a + by 1 



£ H (t) 



E[H 2 ] - (a + 111* + 2) \ ^. 

1 J (a + b + 2) (a + b + 3) 1 



E[H k ] - - (a + l)(a + 2) ... (a + k) 

(a + b + 2)<a + b + 3) ... (a + b + k + 1)* / * 
' . * •/ ^ " , * 

^ If prior knowledge indicates that the value^of^H lies %n a certain part 

of the unit interval, with a degree of certainty reflected in the size of 



/ 



ERIC i 



,\ D2-5 
the variance, the parameters a, b tUay be adjusted *to reflect , these condi- 
tions.. If there is no prior knowledge favoring any set of probable values, 
the complete ignorance may be expressed by taking the uniform case (a « b - 0). 
Example D2-a * . 

A quantity of n items from one run of a production line is selected 
at random for testing. There is probability p that any device in the 

will f me^t specifications. The Quantity p is constant* over any one run; 
its value depends on how well the manufacturing process, including selection 
or preparation of raw materials is controlled. We wish to estimate the 
parameter p f rom t€sts and prior 'knowledge. 
SOLUTION. 

WeHdopt the point of view that p is the value of a parameter random 
variable H. Past experience indicates a reasonable prior distribution 

is Beta, with parameters (3,2) i.e.', a = 2, b = 1. Thus f (t) 

2 1 9 

= 12 t (1 - t), 0^ t <'l v (maximum at t = 2/3). Then 

^ P(B t ) = E[X ] = E[Efx |H] } - E[e(H)] » E [h] = 'V, - 3/5 

< a + b 4* 2 v 

Suppose Xj = 1. Then 

- /(E^Xj - i) ^P( El E 2 )/P(Ej) - BtX^J/BlXj] = E[e(H)e(H)]/E[« 1 ] by CI9) 

' ».E[H 2 ]/E[H] . tja + l)f. + 2 ) (a ± b + 1) 

(a + b + 2)(a + b + 3) (a + 1) 

• a + 2 . 

r+TTT " 4/6 " 2/ ?- 

* 

Note that {E^} is not an independent pair, since PfE^E^ * P(E 2 ). ^ 

Suppose a prior distribution for % is- assumed. ' A sequence of ""n 
trials is performed. It is desired to update the distribution for, H on 
the basis of the results of this experiment. Suppose ' k successes occur' 
(i.e., S n * k); W e want to determine the conditional'distribution for 

H, given S - k. Nqw 

n ^ * 



ERIC . 



D2-6 



F H|Sn (u['k) = P(H < u|S n = k) = E[l N JH)I {k) (S n ))/E[l {k) (S n )1 
-E{I N (H)E^I {k) (S n )j,H])/E{E[lj k) (S n )!H] } , where 

E[l (k) (S n )|H = u] « P(S n * k[H « u) = C(n,k)u k (l - / u) n " k 0 < u < 1. 
Suppose H has the Beta distribution with parameters (a+1, b+1). Then 

F HlS (u|k) ,1 'a+k b*i k , where A = C t n,k) (a + , b + 1) ! 

H|S n A/J t a+k (l - t) b+n - k dt a - b - 

. (a + k)J (n + b - k)! r u a+k,, ,J>+n-k . 

' (n+.aVb + l)! J 0 ^ (1 ' t} , *.0<u<l. 

Thus^ the conditional distribution is Beta, with parameters ($ + k + 1, ^ 
b + n - k +^3. From the formula for expectation, we have 



EfH|s = k] 



a + K + 1 



n a+b+n+2** V 

It should be. noted that since the common factor C(n,k) iir the numerator 

and denominator of the expression for F j cancel out, the distribution, 

v n , , 

^ given S R ■ k, ib the same ^as* that given^ny specific sequence having 

k successes and n - k failures. jj 

The previous development illustrates how conditional independence in 

a random sample, rather than total independence, may be utilized to modify 

estimates of probabilities or other parameters which /control population 

probabilities. * But decisions are lased both on estimates of probabilities 

on costs or rewards associated with actions and outcomes. One is apt 

0 • 

to proceed much. more cautiously (I.e. , to require higher probabilities for 
favour abl^ outcomes) if costs^r failure are high, or to be much more venture - 
some if -rewards for success are great. To ^ovide an analytical basis for 
deci|ioh, one must include some measure or criterion of gain or loss, in 
order that a "be'st" course of action may be determined.; To illustrate, 
we consider one of the moS*t commonly used^ criteria: the mean - squarefr - ej&or 



criterion . _ * f j 



ERIC , JJ 



c D2-7 
- * Suppose (X,HJ has joint distribution and it is desired to obtain a 
♦ "best" "estimate of the value of H from an experimentally determined 

value of X. That is, we wish to determine* a function or decisio/ rule d 
, such that d[x(w)] £s the best estimate of H(w). According-\o the mean- J 
squared -error criterion, we seek* a function d for 1 which E([h - dpc)] 2 } 

y s 

is a minimum. The following argument shows Ctjat the best decision function 
d is given by 

d(u) = e[h[x = u] = e(u), V ^u in the range of X. 
We note that X may be vector valued, in w^Lch case u is a vector. 
Consider 

0 < E{[H - d(X)] 2 } = E{[H - e(X) + e(X) - d(X)] 2 } ' 

= E{[H - e(X)] 2 ) + E([e(X) - d(X)] 2 f + 2E{[H - e(X)][e(X)- d(5Q]j . 

Suppose we put h(X) = e(X) - d(X). By CE6), E[Hh(X)] = E[e(X)h(X)], 

so that the last term above is zero. The first term is fixed. *The 

second^term is positive, unless d(X) = e(X) a.s., which is equivalent v. 

(by Uieorem k\-Z)\> d(u) = e(u)' a.s. [P x ] . Hence, this choice of d 

minimizes the m£an-squared error. 

The argument above solves, the regression problem , in whicj? it is % 

desired to determine the random variable d(X) which is l! nearest"to H 

t in the mean-squared sense. The central role of conditional expectation is 

well known. ^In fact, some authors begin the study of conditional expecta- 

, tion by designating the conditional expectation of , X, given Y, as'^the 1 

random variable e(Y) for which the mean-squared error E{[x - e(Y)] 2 } 
» ... 

Is a minimum. Starting from this point, it is possible to show that e(Y) 
has all the properties of the conceftt as we have introduced it> ~ 
Example D2-b 

Returning to the situation presented in Example D2-a, w\ suppose n items 
are, selected at random from the production lot and tested. Of these, k 



ERLCrn ... ' - *' 100 



-8 



meec specif ications. 'what is the best estimate, in the mean-squared sense, 
of the probability that any item selected will meet specifications? 
SOLUTION AND DISCUSSION. * 
By the development above, if the prior distribution for H is Beta 
(a+1, b+1), the best estimator for H, given S n , .is 

Tjie rule is: count the number of successes in the n units tested, add 
a+1, and divide bya + b+ n + 2. s . 

Suppose no prior information about f^ is^vailable; we should use a *= b = 0. 
Suppose, further, that in*a test of 10 items, 8/ meet specifications. Then 



E[H)S 10 * 8] 



8 + 1 ' 
{ 0 ' + 2 » 9/12 » 3/4 as compared with e[h] = 1/2. 



If the prior distribution were Beta with a =\ 2, b ■ 1, *Ehen 

E[H|S^ = 8] *'T^TJ '» 11/15 « 0.7333 as compared with Eft] = 3/5 . ' 
The conditional distribution for H, given s S n * k, is Beta (a+k+1 ,b+n-k+l) . 
The conditional variance- is " * $ 



Vax.[H(s * = k] = 
n 



( (a + k + l)(b + n - k + 1) 
v2, 



(a/+1> + n + 2) £ -(a + b + n + 3) 
For a =\b » 0, n « 10, k = 8, * ' { 

*Var [H|S 1Q =' 8] » (9 X 3)/(l2 2 X 13) V 3/208 *> 0.0144. 
For a » 2, b » l) n « 10, r k ='8, 

Var [hJs iQ = 8] » (11 x 4)/(15 2 X 16> = 11/900 * 0.0122. 

The prior information, with its approximate location and indication of 
* • i 

^variance, gives rise to a somewhat smaller variance on the conditional 
* v ' 

distribution. ' ^ - % 'J t 



For a more general discussion* of the problem of Bayesian estimation , 

• * \ 

as this procedure is called, seej^od, GraybiH, and Boes [1974], Chap VII, 
Sec 7. Although they do not employ the* term conditional independence, 
£hey assume it by virtue of assuming the product rule for conditional 



EMC 



101 



D2-9 

densities.. They consider* other 'measures of distance or "loss", and 
reUte the results tdflhe results of other estimation procedures commonly 
employed in modem statistics N 



D3-1 



3. A one-stage* Bayejian decision model 

The transition from inference (i.e., determining the most likely 
alternative) tjo decision (determining the course of acHon to be selected) 
leads to the notion of gain gr loss'. In order to move beyond -a purely . 
mathematical criterion such as mean--s<juared error, Vte introduce' the 
notion of a loss function . The loss function is usually expressed in terms 
of some symbol at value, such as monetary units. But its specification 
may require, quite subtle and subjective judgments of "utility" or worth* 
In order to be objective, the decision analyst must obtain from the decision 
^ maker enough information to determine a loss function whose value depends 
upon tKe course of action chosen and the resultant outcome of this action.. 
To set up a model of a typical decision process, we suppose: * 
^ There is a set of possible actions available to the decision maker. 
Action a is a member of the set A of possible actions. 
' it) There is a set of possible outcomes^* which may r^suU from the 

action. Because there is uncertainty about which conse'quence will 
materialize, we represent the outcome as fhe value of an outcoi»» random 
variable (or random vector): y * Y(aj). 
iii) ' The distribution of the outcome random variable Y is determined 
♦ v " by a sxate of nature . This is often expressed as a parameter 

^(possibly vecior-vaiued). Since there is uncertainty about the state 
. ^ 'okjnature, the- parameter itself is modeled as the va^ue of a * 
parameter random variable : u ■ H(tu). 
iv) It may tfe possible to experim ent in order to obtain* some information 
abolt the state of nature. The result of the experiment is the 
value of a test random, variable : x - X((u). Both Y and X' are\ 
jointly distributed with the parameter random variable' H. 



ERIC ' 103 



D3-2 ' 



/ 

v) A loss functlft. L is determined. L(a,y) is the loss «he7 action 
♦ . * * A 

a is, taken and outcome y is experienced (a sain is a negative / 

loss). The usual objective is to minimize the expected loss, 
vi) If experimentation is utilized, a decision rule '(or strategy) is - 
^ determined, to indicate t&e action to be takerr for each possible ' 

observed value of the £est ^random variable. In practice, the 
value of the decision rule may be determined for only the specific 
experimental result observed. 
We consider two cases, 
a) Without experimentation . 

Assume (Y,H) have joint distribution. Let" -t(a,u) * E[L{a,Y)|H * u] . J 
This fs sometimes known as the risk function . The objective 'is to 
, select action a to minimize R( a ) « E[L(a,Y)] * E(e[l^Y) | H> } 

« EU(a^H*)]. In some problems/ Y * H,' so that. t(a 9 u$*= L(a,u) . In 
the case* of no experimentation, no conditional -independence assumptions 
* are needed. , " *" • • 

Example D3-a ° * , 

A» merchant plans to stock an item. .The demand oyer a six -week period 
is assumed to be a« random quantity having the Po'isson diatrij>ut^on, with' 
parameter The parameter value rs not known 1 , but on the basis of past ' 

experience the merchant assumes \ to be j:he value'of a random variable 
H with possible values (15, 25} taken on with probabilities 
'{1/4', J/2, ly<#), respectively. The merchandise may be ordered in lots of 
10. The merchant contemplates ordering either 10, 20, or 30 units. He 
can buy at a cost of * « $7 per unit; he can sell at a' price u - $10 
per unit:. At the end of six weeks, he can return the unsold items for a 



D3-3 



net recovery of r » $3 per unit; so that he loses c - r * $4' per 
unsold unit. - He considers that He l^as lost v (u,- c)/2 * $1.50 per 
missed sale. From a Bayesian point of view 4 how many units should he order? 
SOLUTION S . 

The set of possible actions is A * (10, 20, 30}. Let Y be the random 
variable whose value is the demand in the six-week period (the outcome 
'random variable). The conditional distribution of Y, given H » X, is 
assumed to be Poisson (X*). The loss functfton L is. given^by^ 

7y for y < a 

L(a,y) 




*^we 



(y - a) « -A, 5a + l.J? for^ y > a. 



•If we set B a (Y < a),**we may then write » 

L(a,Y) « I B (4a t - 7Y) + (I - IgXl.SY - 4.5a) » 1.5Y - 4.5a - 8.5l fi (Y .«a). 
Now t(a,X) " E[L(a,Y)|H » \] ^ • 

» 1.5E[Y|H = x] - 4.5a + 8.5aP(Y < a|H - X) - Sf.5E[l y|H * X] . 



We may express 



' a X k .-X 



E[l B Y|H « \] - Sk jre 
k=»0 



- 1 x k X 
X E rr e"* «lP(Y < a-lfH » X), 



k=0 



k.' 



Hence ' . + I 

t(a,X) = X [1.5 - 8.5P(Y < a-l|H = X;] - a[4.5 - 8.5P(Y < a|H « X)]. 
Using a table of cumulative or summed Poisson distribution for appropriabe 
values of X, we may establish the following^?able of values for £(a,X). 

a = 10 20 30 * 



15 

20 
25 



- 21.3 - 23.?' + 15.0 9 

- 14.9 - 44. S 9 - 19.J t(a,X) 

- 7.4 -,49.5 - 51.3 / / \ 
Now % R(a) « E[l*(a,Y)] * ■ EU(a,H)] has values: w 0 * 

R(10) » j [ -21.3 - 14.9 X 2^- 7.4] « - 14.6; R(2Q) = - 40.6; and 

R(30) - £yl8.9. , . ° « ' ^ \ 

The optimum action, corresponding to the minimum expected Iqss, is a ^20. 

ERIC { 



405 



" 1 « ' ' D3-4 

b) With experimentation 

Assume {X,Y,H} has a joint distribution. A decision is made on the 
- basis o^ the experimental data (L.e., on the basis of the observed value ' 
of X). The problem is to determine the optimum decision functioned* which 
designates the optimum action d*(x) when the test random variable X has 
^ value x. Thus, , d* is che decision function vttiich minimizes che 
risk B(d) = E[L(d(X),Y)] . . 

The problem may be formulated in a useful way as follows. By CElb), 
B(d) - E[E[L(d(X>L|x]}. If we set R(a,x) = E[L(a f Y)tx - x] , then by CE10), 
R(d(x),x) = E[L(d(xM,Y)ix - x] * E[L(d(X),Y)|x * x] . 3hus, R(d(X),X) = ' * 
EfL(d(X),Y) |X], so that B(d) - E [R(d (X) ,X)] . For each x in the range of 
X, let d*(x} be the' action for which R(d*(x) ,x)' is a minimum. Then 
B(d*) « E[R(d*(X),X)] < E[R(d(X),X)]^ B(d), fdr all possible decision 
functions d. ^ ^ x 

In the usual situation, the result of experimentation does not affect 
operationally the outcome following the action. V The experimental evidence ' 
may be in the fdrm of previously available data. The result of a given 
action is not influenced by whether or not the decision maker obtains the 
experimental data. What does affect the outcome following an action is 
the^value of the"state of nature" parameter. Thus, it is appropriate to 
assume the'pair {X,Y) is conditionally independent^_given H.< We utilize 
this as follows. 

{) If X is discrete , we may use CI 10b) to assert 
- R(a,x) « E[L(a,Y)|x - x] - E[Efl fx) 0() | H ]E[L(a,Y) [h] }/E[E[l {x) (X) |h] }, 
• Where ^{x^OOl" - = -P(X * x|H « u) « P x | H (x|u) and 
EfL(a,Y)|H] - t(a,H). v . 
Hence, ■ ^ ^ 

R(a,x) » I t(a,u) p x | H( (xlu) dF^ (u ) /P(X - x). 



< 



1)3-5 

' * • - 1 ' : 

2) . If X is absolutely continuous , - * 
R(a,x) B[L(a,Y)|x « x] - E(E[l(*,Y) |h] |x = x} by CI8) 

^ *- EU(a,H)|^ - x] = / ^(a,u) dF H | x (u|x). ^ ' . 

We may use Bayes' theorem for the "conditional distribution (see Sec. C6) 
to determine F„ !v . ' 

Example D3-b 9 

Suppose in Example D3->0 the merchant- recalls that he made a* similar, 
order for a corresponding period the 4 previous y^-r. If X is the random 
variable whose value represents the demand that period, an observation 
of the value for that period shoujfd prov^He sojjfe indication of the state 
of the market for the period. If thetffe is 'reason to believe -that the state 

t 

of the market has not changed appreciably, this information should be> use- 

J ' 

ful for^the present decision. Enough tij^e.has elapsed that sales in the 



is conditionally 



previous period should not influence directly sales in the current period. 
Therefore, it seems reasonable to assume thax {X,YJ is 
independent, given H (the value of which indicates the general state of 
the market). A check of the previous sales records shows that demand was 
"for 24 units. Under these assumptions and with these data,, the t4sk is 
"to select a - d*(24) to minimize R(a,24) = 2 l(a,\ )p Y ,„( 24 |\ )p u (\ )/P(X * 24). 
Values of l(a,\) are* tabulated in the solution of Example D3-a. Under 
the assumed conditions, we may reasonably suppose P x |jj B P y |h* Fron * 
tables of the Poisson distribution, we obtain vaRies of P X | H <24|X), from 
which we determine * 

I P X (24) = P x | H (24|l5)p H (U) + p x j H (24|20)p H (20) + P x |- R (24| 25)p H (25) 
» i [O.0083 + 0.0557 x 2 + 0.0795] = 0.050 

and 

rmq OL) - 21 ' 3 * Q«QQ83 - 14.9 x 2 X 0.0557 - 7.4 x 0.07ft5 10 , , . 
V 9 ' 4 X 0.050 " 

erJc 107 • 



The values* R(20,fc4) - - 45.8 and R(30,24) * - 30.9 may be calculated 
f , . y 

in similar fashion. Once more, the indicated optimum action is to order 



20 units. In spite of the fact that the previous demand went beyond 20 
units, the best bet is to order 20 units and risk the loss of some sales 



A 



108 



4. A dynamic programming example «■ ' 

The following example of a multistage decision process is presented in 
tover and Thompson [1973] , p. 392 ff. Our discussion displays the role of 
conditional independence assumption, which seems -to be both appropriate and 
necessary. • ' 

Example D4-a ¥ 
4 

A company is offered two investment opportunities, which we designate 

"risk" and "safe". 

\ " 

1) Risk/ Either make gain g in a given period or earn* nothing. 

4 • 

Probability of success is unknown, but constant, over the total 
time considered. ' 

2) Safe. Certain to make gain s in the given' period. 

Oains in successive periods are independent, given a fixed probability of 
success. A choice is made at the beginning of each time period, with neg- 
ligible cost for switching from one 'investment to the'other. The objective ' 
is to maximize expected gain over N time periods. 
SOLUTION. 

The probability of success is unknown; we Suppose that it is the value of a 
state-of-nature* random variable H. A prior density f H (or distribution * 
function Fjj) is assumed*. <£o obtain further information, the company must 
experiment by making the risky investment. Suppose I fe is the indicator 
function for Success in the kth risk period, (i.e. , ^ . x iff ^ m \ KiMk 
pays off on the k£h trial). The gain during that period is g i We aS3ume 
the class U k ; 1 < k < N} is idedtically distributed, conditionally inde- 
pendent, given H, with B [I k |H = t] = P[I fc =, l| H = t) = t. Suppose n 
iriska have been taken; let S n be the random variable which counts the 



*D4-2 



number of successes- i.e., S n - ^ + 1 % + . . . + - The succession 
of 'choices to take the risky alternative constitutes a Bernoulli sequence, . 
with conditional independence, given the parameter random variable h\ 
In Sec D-2, we establish an expression for 
' P(n,k) - E[l n+1 |S n - k] = E[H|S n - kj, when H has the Beta distribution 
If there is no basis for assigning* a given prior distribution for H, 
we assign the uniform distribution. According to the results in Example 
D2-b, we have 

- Iff k 

To develop a strategy based on optimum expected gain, /we utilise the A * ' 

i 

backward induction procedure of dynamic programming^. Consider the beginning 

of the jth^ period. If n risks have been taken before stage j, then 

there is an "optimum-path'^gain random variable G » f 4S * I ~ 

/ n *J n,;p n' n+1' 

I N ). At most N risks will be taken, but not necessarily this many. It 
is convenient to use a decision tree to keep account of the alternatives 
(see Fig. D4-1). ' 
Suppose S n ■ k»--*rn'e decision rule is risk iff 

E[gI n + l . + G n + l, J+ ll S „ " kJ % > S + It^, j+X ' kl. 

'-"> We wish to obtain an expression for G Consider the set 

u n,j 

M - (k, E[gl n+1 + G n+1>j+1 |S n - k] > s + E[G njj+1 |S n - kj). 



er|c 



110 



* ' ' D4-3 



Then 



Since B(X|A]*- E[X|AB]P(B|A) + E[X|AB C ]P(B C |A) , EJgl^^ - k, i' «1] - g> 
and' » l|s n = k) = p(n,k), we obtain- 



V 

- fg + V J+1 (r>fl,kH-l)]p(n,k) + p j+l (iri-l,k)U - pfn,k)J, 
where ^(n.fc) » E[G n ^|S n « k]. We may formulate the decision rulers fql- ' 



lows: 



^(n.k) •■«'Us + 9^ x <t*l^ - p (n,k)] 6 , s +<Pj+1 (n,k)} 

with <J^ fl (n,k) - 0. 

To see how the prodfedure goes, let g - 5/2, s - 1, N - 2, f M (t) « 1 on * 
[0,1], • so that £(n,k) - Refer to Figure D4-2 for situations at de- 

cision nodes, ' 

At the final decision node, j.«*N » 2, and (n,k) - (0,0), (1,0), or (1,1) 

Determine cp 2 (0,0) ,^ 2 (1 ,0) , (p 2 (l,l) and%the optimum action in each case. 

*> 

P(0,0) « 1/2, P (l,0) - 1/3, p(l,l) - 2/3 

1 5 * 

cp 2 (0,0) « max {^g + 0, sk* max (f, l) - 5/4 (risk) 



' cpjCl^O) - maoc f|s + 0,s} - max'ff, l) « 1 , (safe) ' 

•I 

er|c 112 



• . <p 2 (l,l)'« max {jg + 0, s} = max {|, * 5/3 (risk) , 
At the initial decision node , j = 1 and (n,k) = (0,0) 

- ^(0,0) = max {[g + <P 2 (1,1) )|\+ cp 2 <l,0)~, s + <p 2 (0^)) 

= max {[5/2 + 5/3]|V 1/2, 1 + 5/4) J 



1*^4 



- max '{31/12, 27/12} = 31/12 



(risk). 



The indicated strategy is : 
First decision: Risk ~ «p x (0,03 =31/12 

Second decision: If first risk is successful ~ <p 2 (l,£) » Risk. 

If first risk unsuccessful ~ ^(1,0) Safe. * 
The expected gain from this strategy is ^(0,0)^01/12 w 2.58. [] 



114 



5. Proofs of the basic properties * * 

To establish the equivalence of Properties CIl) through CIA), we 
show CIV)** CI2) « CI3) => CI4) CI2). To* simplify writing, we drop 
the "a.s." t in the step -by -step arguments. 
CIl) CI2) 

,E(I N (V)E[l M (X)|z] !Z} = E[I M (X)!Z]E[I N (Y)|Z] by CE8) 

. , m * E[l M (X)I N (Y)|z] # • * by CIl) 

= E{E[I M (X)I N (Y)|Z,Y] |ZJ 'by CE9) 

= E{I n (Y)E[I m QC)|z,Y] |Z} ' by CE8). 

Now 

e(i q (z)e(i n (y)e[i m (x)|z]1 v z}) 

- E(I (Z)I N (Y)E[l M (X)|z]) V Borel Q by CEl). 

A similar expression holds for all Borel Q with E[l M (X)|z] replaced, 

by E[l M (X)|z,Y] . We thus have 

E(I Q (Z)l N (Y)E[l M (X)lz] f = E[I Q (Z)I N (Y)E[I M (X)]Z,Y] ) for all Borel sets 
N, Q on the codomains of Y, Z, respectively*. By E6b), we may assert 

E[l M (X)|z] = e x (Z,Y) = e 2 (Z,Y) = e[i m (X)|z',y] a.s. 

CI2) CIl) * 

E[l M (X)I N (Y)|z] = E{E(j m (X)I n (Y)|z;y] |Z} ' * by CE9) 

-.E(I N (Y)Efa M (X)|2,Y]|z} ' by CE8) 

- E(I N (Y)E[I M (X)|Z] |Z} r ( .by CI2). 

* E[I m (X)!z]e[I n (Y)|z] ' by £E8).' 



o 11 5 

ERLC 



CI2) » CI3) '' ' " , " 

4 E[I m (X)I q(Z )|z,y] . I Q (k)Bfl M (X)|z,Y] ^ -//by CE8) 

* 'qtt^hiiOt)!*! by CI2) 

/ . . « E[yx)I Q (Z)|zl; • by >CE8 ). 

CI3) => CI4) - 

B[l M (X)I Q (Z)|Yr -B{B[I M (X)I Q (Z)|Z,YJ|Y}" by CE9) 

' = E{E[l M (X)I Q <zy|z]J*) by CI3). 

CI*) => ci2) J t 4 • 

E{E[I M (X)I Q (Z)]ZJ|YJ = E[I M (X)I Q (Z)|Y] by CI4) 

= E{E[I M (X)I Q (Z)|Z,Y]|YJ ' , "by CE9). 

Ihis ensures that for all'.BoVel- N on the oodomain of Y 

E{I N (Y)E[I M (X)I Q (Z)|Z]J_= «I N (Y)E[I M (X)I Q (Z)|Z,Y]} * by CE1). 

But this, in turn, ensures that 

E{I„(Y)I Q (Z)E[l M (X)|zJ} - E{I N (Y)I (Z)E[l M (X)| Z> Yj} . by CE8). 

By E6b), we, wust have - 

Efl M (X)|z] = E[I M (X)|Z,Y] a.s., which is .CI2). 

We wish to establish next the equivalence of CI5) through CI7) to 
the propositions above. It is apparent by the special-case relationship 
that Cl3)=>CIl), CI7)=>CI6)=>CI2), and CI8) CI4).. -Extension of 
CI1) to CIS) may be done by a "standard* argument" based on linearity,' 
mpno tonicity, monotone convergence, and approximation by step functions. 
Extension of CI3) to CI7) may. be achieved by an argument similar to * 
that sketched in the discussion of the propf'of * CE10), plus a "standard 
"argument, " A similar approacf? serves to extend CI4) to CI8). 

/ * 



116 



Before proying*Cl9) , we obtain a lemma useful here and elsewhere. 
Lemma 05^1 

If E[g(H)|v,U] i E[gOH)|v]^ a.*, and Z ^h(U), with h Bor^l, 
then E[g(W)|v,z] =» E[g(W)|v] a.s. 
PROOF * 

The randaiA vector (V,Z) * (V,h(U)) is a Borel function of (V,U). Hence 
.E[g(W)|v,z] « JE{E[g(W)|v,u] |v,Z} a.'s. - by CE9) 

- E{E[g(W)|v] |v,z'} a.s. * " by hypo the si s\. 

* E[g(W)|v] v a.s. v by CE9a). ^ / 



1PR00F OF CI9) 

For any Borel function .g; 
E[g(X)|z] -E[g(X)|z,Y] a.s. 

= E[g(X)|z,V] a.s; 
Hence, {X,V} is conditionally independent ,' given Z 
For any Borel function r, 

E[r(V)|z]^E[r(V)|z,Xr a.s. 

* E[r(V)|z,u] a.s. 
Hence, {U,V} is c6nditionally independent ,* given Z 



by CI6) 

by Lemma D5-1. 
by CI6). 

» by CI6) 

by Lemma D5-1. 
by CI6). j j 



• ill 



PROOF OF CI 10) * 

a) , E[g(X)h(Y)) = E(E[g(X)h(Y)]z] } 

* * - E{E[g(X)|z]E[h(Y)|zJ } 

= E[ ei (Z)e 2 (Z)J 

b) E[g(X)|Y € N]P(Y € N) = E[l N (Y)g(X)] 

= E(E[l (Y)|z]E[g(X)|z] } 



by CElb,) 
by CI5) • 



(notational change) 



by CEla) 
by part a) 



PROOF, OF CIll) 



Giv^n chat (Y, (X,Z)} is independent > 

P(X 6,M, Y*€ N, Z € Q) = E[l M (X)I N (Y)I (Z)] - 

= E{E[l M (X)I N (Y)I Q (Z)iz] } 
■ E{I Q (Z)£[I M CX)I N (Y)[Z]} 



by Ela)/ 
: by CElb)^ 
by CE8). 



Also,' 

P(X i M, Y £ N, Z € Q) = P(Y € N)P(X € M, Z~€ Q) 



= E[l^(Y)]E[l M (X)I (Z)] 

= E[l N (Y)]E{X (Z)E[I M (X)|Z] } 

= -e{i q (z)e[i n (y)]e[i m (x)|z] } 



by independence^ 
by Ela) 
by CE1)/ 
by E2) 

- t{I Q (Z)E[l N (Y)|z]Ell M (X)|z]} by CE5). 

Equating the last expressions in each series of inequalities, by EG) 
we conclude Chat E[l M <X)I N <Y)|z] = E|l M <X) |z]e[i n <Y) |z] a.s. {J 
PROOF OF CI 12) . 

As in the'proof of CE10), it is sufficient to show the proposition holds 



for g 



I MXN " X mV 



e[i m (x)i n (y)|y - u, Z A v] = i n (u)e[i m (x)|y 



v] 



- bf CE8) 
by CI6 ) 

E[I M (X)I K (U)|Z » v] a.s. [P yz r by CE2). 



= I N (u)B[l M (X)fz = v] 



118 



6 „ Problems " , . s - ' * 

D-l Show Jhat if (X,(Y,Z)) is. independent, then ... * 
, *' E[g(X)h?Y)Jz1 - E[ & (X)]E[h(Y)|z]' a\s. ' •/ 
l>-2^Let {X 1 :„l < £ <'n} be a random sample, given H. Determine the 
•best mean-square estimate for H, given W -^X^X^ * n ) for 

each of the fo\lowing cases: * 

i) X is delayed exponential: ^ x | H (t|u) * e" (t -' u) f 0 r tSu, and 
H is exponential (1)': f„(u) » e~ Q for u > 0 
ii) X is PoissoX*(uX: P x | H (k|u) «.e*" u 2y k « 0, 1, 2,"..., and 

H is gamma »(m,\):o f^(u) =* xV 1 " 1 e~ Xu /(m-l): u > 0, mt> 0, X > 0 
iii) X is geometric (u): P x | H (k|u)»» u(l-u) k k **0 f 1, 2, and 
H is* uniform '[0,1], * , 

D-3 In Example D2-b, suppose a » 7, b = 3. Compare the prior density 
0r • for H and the quantities E(h|s iq - 8] ' and Var[H|s i0 «= 8] with 

those for the case- a'* 2/ b = 1, as in the example, ^ . 

Dr4 Consider the demand random variable of problem C-13: a 

n * x # a 

Suppose (N,(H,X 1 ,X 2> . . . , X n > } is independent for^each ti > 1, antf^ 

EfxjH-u] « e(u-), invariant with i. Show that" E[p|*f] «E[N]e(H). 

D-5 1$ is, desired to study the waiting ti»me for the arrival of an ambu- 
* 

* % N 

lance after* reporting an accictent (see Scptt, et al ? f*&78]). Direct 

statistical data are difficult to obtain. Suppose we consider the 
random variables * 

N - number of ambulances in* service (integer-valued) 

D ■ distance traveled by dispatched ambulance 

V « average velocity o* the ambulance for the trip.. 
By considering the geometry of the. deployment scheme, it is possible 
* to make reasonable assumptions about P(D< t|N - n) Also, it is 4 




1 

D6-2 

possible to make reasonable ass^ump.tions , on the basis^of statistical 
data,vfor the distribution of V and the distribut/on of N. We 
have W s D/V, where Jt is the random variable whose value is thp) 
waiting time. Then J>(W < t) = E[l (D,V)j , where Q = ((u,v): u < vt).. 

a) Show that if (V,(D,nT) is* independent , the* 
- -P(W < t|N * n) = J p(D < vt|N = n) dF (v) , 

Suggestion. ^ CElb), CI11), CI12). • 

b) Under the conditions for part a) and the assumptions , 

i) P(D <^sj N = n} = as, 0 < s < ^/a, > where a 2 = nrr/A 
ii) V is uniform [15,25] 

where A is 'the area served, in square miles, D is distance in 
miles, and V is velocity, in miles per hourr' f 



^ c) Repeat part b) with i) replaced by «, 

i') P(D^5 8 |H- n) = 1 - e' ?S , s->0, a 2 - nrr/A. " ^ 

D-6 In Example D3-b, suppose the previous demand was 26 units. What is 

the optimum action? ^ 

D-7 An electronic game is playeu 1 as follows. A probability of success in 

a sequence of Bernoulli trials is Elected at random. A player is 

j - ' m ' 4 

•allowed to observe, the result of rrf trials. He is then to guess the 

c * " 

the number^of successes in the next # n trMLS, If he guesses within 

one of the actual number of successes, he gainf^ne dollar (loses -1); 

(- if his guess misses by two or more, he loses one dollar^ Suppose 

m ■ 3, n * 10; on the trial run there are tftra out of three successes. 

What number should ^e then guess to minimize his expected loss? Let 

X * number of successes in m on the trial run 

Y ■ number of successes in 'n on the pay run 

H » parameter random variable. , * 



erJc * 120 



D6-3 

Then X ^Ls^ binomial (m,u), given H « u 

'Y is binomial (n,u) , given H * u 

9 H is uniform on [0,l] 



f -1 for la - y| < 1 \ 
and L(a-,y) « i . 

[ 1 for |a"- y| > 1 



D-8 In Example M-a% determine the optimum strategy for*' g - 5/2, s - 1 
t 3 
N -«3/ £ uniform oto fO,l] . 



<» 

ERIC 



1 



• 0 



f-. . \ 



E. Markov Processes and 
Conditional Independence 



'122 



r\ 

MARKOV PROCESSES AND CONDITIONAL INDEPENDENCE 

1, Discrete-Parameter Markov Processes. ♦ * EM 

Z Markov Chains with Costs and Rewards E2-1 

3. Continuous-Parameter Markov Processes E3-1 

^4., Tr)e Chapman- Kolmogorov Equation E4-1 

/5. Proof of a Basic Theorem on Markov Processes E5-1 

6. Problems * E6_1 




123. 



El-1 

c 

E. Markov processes and conditional independence 

i 

1. Discrete -parameter Markov processes 

The notion of conditional independence has been utilized extensively 
in advanced treatments of Markov processes. Such processes appear as 
models of processes without "memory." The "future" is conditioned only 
» by the"present" and not by the manner^in which the present state is reached." 
The past thus affects the future only- as it influences the present. We 
wish to make the connection between the usual introductory /treatment and 
the more advanced point of view, which is not only mathematically powerful 
but intuitively helpful in displaying the essential character of Markov ' 
processes. R>r a recent introductory treatment utilizing conditional 
independence, see ^inlar fl975]. 

Many elementary textbooks include a treatment of Markov processes with 
discrete parameter and finite, or at most countably infinite, state space. 
Suppose we have a sequence [X n : 0 < n) of random variables, each with 
range (0, 1, 2, N). Ihus, the parameter set is f « [0, 1, 2, ...) 

arid>the state space is S » (0, 1, 2^ . . . , N). The Markov property is 
expressed by the condition > * 

M) P(X t+1 - j|X t - i, X|M « l t-1§ X Q - l 0 f ^ 

-P(x t+1 = j|x t - i) - Pij (t) 

for all t > 1, all (i,j) € S 2 , all " <i Q , i^ . . . , i^) € S*\ 
The quantities P^CO are called the transition probabilities . In the 
important case of stationary (or homogeneous) transition probabilities, 
we have P^U) - ? t y invariant with t. In this case, analysis is 
largely algebraic, with the transition matrix P « [p ] playing a 
central role. ^ 

The fundamental notion of the Markov property- ~H) is that the past 
does not condition the future, except as it influences the present. We can 

ERIC 124 



El-2 ' l 

give the tfarkov property M) %n alternative 'formulation which emphasizes 

the conditional independence of past and future, given the present, without 

restriction to discrete state space or to stationary transition probabilities. 

To aid in formulating this condition, we. introduce the following notation. v 

If S is the state space, then 
k 

S m set of all \ -tuples of elements of state oSpac'e S 

u s - (x 0 , Xj.; .... x s ) 

V 8,t ■ <V X S + 1 V 

w, 



t,u * (X t' X t+1 V 



U : 


n 


-» s 8 ?; 


s 








a 


-» s t;s+1 








n 


-» s u " t+1 


H t,u ! 





s < t. 

t < u. 



We indicate by U* a random vector whose coordinates consist of a subset 

(in natural order) of the coordinates t)f U , and similarly for V* and 

8 > 8|t 

W* . Theit* U*, V* , and W* are continuous, hence Borel, functions 
t,u 9 s s,c , * c,u ^ 

of . u „> V « f* **** w .- respectively. When we write a function g(U ), 

S a |t L,U ° S 

h ^ V « i-)> etc., we suppose g, h, etc. are reat-value<f Borel functions 
such that E[g(U s )], E[h(V g , etc. are all finite. « 

If t represents the "present", then U . represents the"past 

I s - 

behavior" of th^ process and W fc+1 u represents the behavior of the 
process for a "finite future." We sometimes consider an "extended present", 
represented by V , s < t. 

In this notation, the Markov property M) is .equivalent to 

<? 

P(X t+1 6 M|X t - u, U t ^ - v) - P(X t+1 6 M|X t » u) 

VtM, VBorel sets M c S, V u 6 6, V v 6 

t 

which is equivalent to ^ 

M) E[l M (X t+1 )|x t , U t-1 ] - E[l M (X t+l )|X t ] a.s, V t >1 ( VBorel MO S . 
Reference to Oil 4 ) shows property M). is equivalent to 

^ X t+1* U t-1'^ is conditionailv independent, given X fc , V t > 1. 



ERIC 



9 * 125 



El -3 



ffinNITIOH. Ihe process [X^; t € T) is Marfeov iff M») holds. 
Use of CI1) through CI8) provides a number of alternative formulations 
o t f the basic condition M). It is sometimes desirable to remove the 
restriction to the immediate future. This can be done (and more), as the I 
• following theorem shows. 
Theorem El-1 

A process [Xj.: t 6 T), T - (0, l f 2, ... ) is Markov iff 

M") { w J+i )t+n » u *. 1 ) is conditionally independent, given any finite 

extended present V , 1 < s < t, any n > 1 , any W* * U* 

s » c - - ~ t+l,t+n' 8-1 * ' 

A proof is given in Sec E5. ^ ' 

-To see how- the idea of conditional independence is an aid to modeling, 
we consider several examples, % 
Example El -a One*dimensional random walk 

A number physical and behavioral situations can be represented schemati- 
cally as "random walks." A particle is positioned on a line. At discrete 

•instants of time t^ t 2 the part*eie moves an amount represented 

by the values of the random variables Y^ , Y 2 respectively. 

Positive values indicate movements in one direction and negative values 
indicate movements in the opposite direction. The position after the n th 
move is x n - Y x + Y 2 + ... + Y n (we take X Q - 0). ^If we -can assume 
the c^ass^ 1 < i} is independent, then X Q+1 f X n + Y n+1> with " 

t Y n+l» * U n-l ,X n*' independent for all n > 0. Since the position at time 
t n+l 18 affected b y the past behavior only as that behavior affects the 
present position X r (at time t n ), it seems reasonable to suppose that 
the Markov condition holds. , r i 



ERIC 



> 126 



EJ-4 

Example El-b A class of branching processes 

Consider a population consisting of "individuals" able to produce new 
individuals of the same kind. We suppose the production of new in&vi'd- 
uals occurs at specific instants for a whole "generation." To avoid 
the complication a possibly infinite population in some generation, 
we suppose a mechanism operates to limit the total population to M 
individuals at any time. Let X Q be the original population and suppose 
the number of individuals produced by each individual in a given generation 
is # a jjgpdom variable. Let be the random variable whose value is 

the number of individuals produced by the ith member of the nth genera- 
tion. ; If Z ln ■ p, that individual does not survive; if Z =1, either 
that individual survives and produces no offspring or does not survive and 
>producesone offspring. If X r is the number of individuals i n the nth 
generation, then ^ , 

y = min ( M, ZZ ln ) - g( x n , W , where = (Zln , Z2n , Zmb ). 

I f ( z in l 1 1 1 < M, 0 < n < *} is an independent class, £hen 
^n+l'^n-l^n^ ls an * nde P e £ dent pair for any n > 0. Again we have a 
situation in which past Behavior affects the future only as it' affects the 
present.^ It sfcems reasonable to suppos^ the process (X r : 0 < n} is 
Markov. jj ** 

Example El-c An inventory problem ' , 

A store uses an (m,M) inventory policy for a certain item. This means: / 

/ 

If the^ stock at the end of a period is less than m, "order up" to M 
If .the stock at the end of the period is as much as m, do not order. 
Suppose £he merchant begins the first period with a stock of M units. 

Ut X n be the stock at the er * d of the nth period (X Q » M). If the 
demand during the nth period is D n> then 



x . , 



'max {(M - D n+1 ), 0} if 0 < X < m 

| 11 * " 8( V D n + l>' ' 

i*s* {(X n - D n+1 )»0) if m<X R <H 

If we suppose {D n : l^<>n} is an independent class, then we have , — 
^ D n+l'^n-l' X n^ independent pair for each n > 0. Once more 

it seems the past and future should be conditionally independent, given 
the present. [} T 

« 

Each of these examples provides a special case of the following 
Theorem El -2 

Suppose {Y n : 1 < n} is an independent class of random vectors. Set 
X Q « c (a constant) and for n>0 let ^ " g n+1 <X n ,* n+ J. Then 
the process {X n : 0 < n} is Markov and , 
P(X n+1 € ;Q |X n - u) - Pfg n+1 (u,Y n+1 ) € Q] Yn>0, V u € S, V Borel-sefc Q 



PROOF 



\ » (X Q , X } X k ) « h k (Y r Y 2 , Y k ), l<k<n. Thus, 

^n+l'^n-l'V) 18 lnde P enden t. By property CI11) (* n+1 > is 
conditionally independent, given X r . Hence, we have for any n, any Borel 
set Q, * 

-%l8,ri-l0C n ,V n+1 )]|X n ) . byCI7> 

" E[l Q (X n+J>l X n ] 

which establishes the Markov property. Now ^ * 

F(X afl € Q|X n " u) - Bl V X n+l >|X n l " tt] -' 

"^Wl l X n" u ) 

" B 5fQ l8 n+l (tt 'W ] 1 . byCEll) , 

- ( P[g n+1 (u, Y n+1 ) 6 Ql by Ela). [} - 



O 

ERIC 



128 



If 8 n+ i " 8» invariant with n, and if {Y n : 1 < n} is independent, 
identically distributed, then P(X n ' +1 6 « u) - P[g(u, V ) 6 Q] 

is.%nva^Lant with n. To illustrate, we consider the inventory problem 
above (c.f. Hillier and Lieberman [1974], Sees 5,17, $.18). 
Example El-c (cont/inued) 

Suppose m = 1, M » 3, and D n has the Poisson distribution with X « 1. 
Then'the state space S * {0, 1, 2, 3} and P(X n+1 * j | ■ i) « 
P[g(i, D n+1 ) - j] . g(0,D n+i ) o pic {(3 -D n+1 ), 6} . 
Since g(0»D n+1 -) - 0 iff D n+1 > 3, 

' P(X n+l " °' X n * °> " P < D n+ i > 3) » 0.0803 (from, table)., . ' 

Since g(0, D n+1 ) « 1 iff D n+1 « 2, 

P(X n+l " 1 l x n " °> " P < D n +l * 2) * °' 1839 < froin table). 
Continuing in this way, we determine each transition probability and hence * 
the^ transition probability matrix * * 

0.0803 0.1839 P. 3679 6,3679" 
0.6321 0.3679 0 0 
©.2642 0.3679 0.3679 0 

0.0803 0.1839 0.5679 0.3679J. ^ 
The calculation procedure based on the equation p ( x n+1 s j|x « i) » 

" P ^ S n+l^^ Y n+1^ " Jl can be Justified in elementary terms for many special 
cases. Ihe general result in Theorem EJ.-2 shows how the desired conditional 
independence of the past and future, given the present, arises out of the 
independence of the sequence {Y n : 1 < n} and establishes the validity of 
the calculation procedure in any situation (including continuous statfe \ 
space) . 



ERJC 129 



E2-1 



2. Markov chains with costs and rewards 



-In a variety of sequential* decision making situations, the progression 
of states of the system in successive time periods can be represented in a 
useful way by a Markov process. The Markov character arises from the "mem- 4 . 
oryless" nature of the process. Often such sequential systems have a reward 
structure. Associated with each possible transition from one state to another 
is a "reward" $which may be negative). Consider the following classical ex- 
ample, utilized by Howard [I960] in his pioneering work in the area!. 
Example E2-a • ^ 

The manufacturer of a certain item finds the market either "favorable" or 
"unfavorable" to his product in a given sales period. These conditions may be 4 
represented as state 0 or state 1, respectively.. If the market is favorable 
in one period and is again favorable in the next period (transition from state 
0 to state 0), the manufacturer ' s earnings are r QC) . If the market is favorable 
in one period and unfavorable in the next (transition from state 0 to state 1), 
the earnings for the period are a smaller amount r Q1 - Similarly, the other 
possibilities have associated rewards. If the succession of states can be 
modeled by ^Markov chain with stationary transit ionjprobabil it ies, then the 
system is characterized by two entities: the transition probability matrix P 



the reward matrix R, 


given by 








<#* 








" p 00 p 0l" 


R - , 


^00 r 01 




. p 10 ' p ll 




. r 10 r ll 


# * 









■ u 



We may express a general model foe such a system as follows: 

Let (X n : 0 < n} by a discrete-parameter Markov process with finite state 




130 



E2-2 



space S. Xhe reward structure is expressed oV the sequence {R n : 1 < n) 
of random variables ^ ^ 

We are assuming that neither the reward* structure nor the transition* 
probabilities change with time. While more general situations could be 
modeled, we use the time-invariant case in subsequent developments. 

i 

Then ^ = E f r ( x n >X fl+1 ) |X n « i] 



^- BlR n + l' X „ 



present state is i. 



i]-» expected reward in the next period, given the 

by CE10) 

n+m 



^E[r(i;x n+1 )|x n -"i] 



p ut R (m) ^ R n+1 + R n+2 + .J + » total reward in the next m periods 

Now 



n-Hc ,A n+l J 1 n 



. bjr CI8) 



- 2 E[R, 



j 



n-Hc ,A n+l 



•ill 



From this it follows that 



j] p. 



ij* 



Xf we put 

v^ ■ .E[R* m ^'|x - i] (invariant with n in the stationary case) 
we have ( % 

A second type of reward structure is exhibited in the following 
class of processes, which include inventory models of the type illustrated 
in Example El-c. 'I <s> 

Let {X^i 0 < n) be a constant Markov chain with finite state space, and 
let .{E> n+1 : 1 < n) be ai) independent , ''identically distributed class such 



Chat for each n > 0, 



ERJC 




, (X~ X t f )} is an 



131 



E2-3 



indepenclent pair. The, associated reward structure is expressed by the 
process {R n : 1 < t n), witji ^ * 

B > R n+1 - r .<V D «H-l>' 
Property CEll) shows that 

q i * E ^ R n+l' X n * i J a E f r ( i > D n +l^ (invariant with n). 
The hypothesis ( D n +k> u n +k 1^ independent and e property CIll) ensure 
that fDjj+k* X i^ is Conditiona ^y independent, given Jtp for # 
9 1 i>j < n44c-l, i t j. Fof fixed n, k, let" 6 4 

e( Xi )'- EfR^lxJ « EtrCX^^, D^^L for any i < n + k - 1. 
Then by V CI8) . . ^ 

e(X n > * Efetf^lxJ a.s. n<i<n + k-l.„ " 
Hence, 

Bl) z KJ\ * u - E( E [R n4k u;; 1 ]ix n = ij 

V- E E[R n+k |X n+1 '= j] Pij . 

m — 1 

Applying this formula for k = 2,3 , ...,.m, we ^obtain 
B2) v<"> = qi + £ p^- 1 )' with »fU v ^ 
The identity of form of Al), Bl) and A2)» B2) .shows that the following 
analysis holds for either type of reward structure. , 

<> 

Consider the average expected reward per period for m periods. 

**?*»'- i^v*!. ' - ~ ,s * 

i»l 

1 m ' 
ECEfEfR^lx^^] |X n ^ 1 )) Dy CElb) and CI8). 



i=l 



' E f E f R n4il X n+ i-l ] l X n-l - k )-J*SV . 
where 'P^j^ is thdv i-fctep transition probability from k ft j. Hence, . 

mt- ^ - . • .132' 



E2-4 

If the Harkov chain is constant, irreducible, aperiodic, as is usually 
the case, it is known that 

1 m (i) ' " )S ~ , 

™ i^l^ "* ^ SS m "* *° * invariant in k ) « 

Here nj ts the long-run probability that the process is in state j. 

. Since the limit is invariant with k, we may sum out the P<X - k) 

v n-1 ' 

to obtain & * 1 

M) U»E[iRW] - j - n • 

/A similar argument^ sjpts that for each state i 

nc*» v ^m-jco 1 j J J 

Here g i s the average gain or reward per period, in the long run. We 
illustrate by considering numerical values in the introductory examples. 



Ex ampler E^a (continued) 



"l/2 


1/2." 


1 , 


"5 


5 




"9 3 


2/5 


3/5_ 


55 10 


5 
• 0 


5 


and Ffc^ 


3 -7, 



t Suppose F 

To find the long-run distribution, we solve the set of equations 
5 n. + 4 n, ■ 10 n„ 



5 n Q + 6 * 



10 to obtain the valu 



n Q » 4/9 "and n * 5/9 




Then 
g 



p oj r oj 



r 

< n 



9 

ERIC 



133 



r 



E2-5 



Example El -c (continued) 

Suppose m - 1 and M • 3, as befox^gr <> 
If k units are ordered, the cost is 10 + 25k, 0 ^ k <H. 
If k - 0,- the cost of^r4ering^s^z1iro>^^ f 
For each unit of unsatisfied demand, a penalt/oF]?50 is assessed 

We suppose the demand D r in period n has Poisson distribution, with 

X - 1. We may then calculate the cost function (negative of reward)' 

C(X ,D ) - + 25<M " + 50 {<1>n +l " M) ' 0) f ° r ' ° ~ X n < ° 

n> n+1 1)50 max {(D n+1 - X n ), 0} for o<X,<M. 

Thus , 

0(0 > D n+l )l " 85 + 50 Bllx t< ,) I |+r ' 3) > °) 

C(i ' D n+1 } ' 50 max f ( Vl ' °' 0) for i - 1. 2, 3. 



Now ?t 



q Q - E[C(0, D n+1 )] - 85 + 50 E [i (J) > 3J (D - 3)] ^ 

- 85 + 50 Z (k - 3)p, (.term for k - 3 <is zero)* 
k-4 k . — 

For the Pdisson distribution Z kp -A Ep. . Hence, 

k«n k k-n-1, 

# 09 CD 

• 50[ E p fc - 3 Z p k ] - 86.2 (Using table for Poisson distribution). 

k-3 k-4 y 

q - E[C(1, D )} - 50 2 <k - l)p « 50[ Z p . £ p ]'. sOp. -18.4. 

• k-2 R k-1 < K k-2 k 1 

Similarly, we obtain ^ 

q E[C(2, D 0l - 50 Z (k - 2)p. * 5.2 
^ T1 * ' k-3 - " K 



,k«3 

and 



/ 

/ 

/ ' 



134 

:RIC 




J B2-6 



q 3 - E[C(3,D n+1 )] - 50 E (k - 3)p. - 1.2 . 
£■4 

To obtain the long-run probabilities, we utilize the fact that the 

convergence is rapid and consider p 2 , p 4 , ... until results stabilize. 

Direct calculations, of matrix products shows that 
0.286 



*8 



0.286 
0.286 



0.285, 


0.264 


0.166" 




0.285 


0.264 . 


0.166 




0.285 


,0.264 


0.166 




0.285 


0.264 


0.166 





w 1 (m) 
m-*» 



from which we conclude tt q » 0.286, tTj * 0.285, n £ « 0^264, and 
n 3 * 0..166. These add to 1.001, in3icating a small roundoff error. 
Utilizing these values, >we obtain 

j Vj ~ 31 : 5 • n 

The treatment, once equations Al), A2) or Bl), B2) and 3), 4) are 
obtained,is standard. As a matter of fact, we have use$l examples taken 
from pflbli shed texts. In most standard works, the derations are 
intuitive and incomplete. We have provided a development based on 
fundamental assumptions of independence and conditional independence 
(or Markov conditions). Such a developmenj^should both sharpen intuition 
and provide a sound mathematical basis for utilizing the models. V— 



13c 



erJc 



\ 



E3-1 



3 - Continuous -parameter Markov processes 

There are certain technical difficulties in the theory of {Continuous - 
parameter processes, , However, advanced methods show that a process can be 
determined essentially for applications if all finite -dimensional distri- 
butions are determined (i.e.,. if the joint distribution- for any finite 
subclass of the random variables*is determined). 

Consider a real precB^ (X c : t > 0} (i.e., T «= [0, <^r Let U, V,^ 
W be finite subsets of T: ' 0 .'(^ ^ .... uJ , v . (v^ ^ .... vj, 
and W - (w lf w 2> Wq }. We suppose Uj . < „ i+li Vj < v J+1 , and* 

' \ < W k+1 for a11 ind ^ated i, j, k. we say U precedes v, denoted 
U < V, iff every element of U^is less than every element 0 f V. We 

put V « (X. X X u ), ^ . ( x ' , x v , .... X ), and 

12 m I 2 \ 

1 2 q 

DEFINITION. Ihe process (X t : t > 0 } is a Markov process iff for 
any U < fv} < (w) we have 
~ M) Efl M (X w )|X y , y * E[l M (X w )|x v ] a.s. for all Boreal Sets M on 
the codomain of * w (i.e., in the state space .S). * ? 
It is clear that condition M) is equivalent to 
M') For any finite (v) < (w }, (X w> is conditionally ^dependent, 

given X . t * 

v « 

As in the discrete -parameter case, we have the equivalent condition (see 

Theorem El^-l) # 

M») For any finite U < V < W 'in T, (X^ X^ is conditionally 
independent, given Xy. 
These and other equivalent expressions for the conditional independence 
condition provide major tools for the study of Markov processes. 




136 



Hany of the Markov processes encountered in practice may be recognized 
by virtue* of the following property. s 9 y, 

s • 

DEFINITION. A random process {X £ t t € T) has independent increments' 
iff for each finite subset T n * {t Q , t^, . . . , t n ) of the 'parameter 
set T, with. t Q < ^ < < t n , the class 

'{X 0 X - X , X - X. , • . . , X. - X. ) of random ^variables is 
* C 0 t l C 0 C 2 C l fc n 'n-l 

independent. 

Two of the mo$t widely studied and utilized random processes have this 
property. <• 
Pols son process . 

> The parameter se't is » T * [0, c*) . The' process counts the* number of 
occurrences of some phenomenon in given time intervals.* The random 
variable X fc counts the number of occurrences in time interval (0,t]. 
We set Xq * 0. Then X £ - X g , for s < t, is the number of occurrences 
in the time interval (s,t]. The property of fndependent increments 
models the fact £hat the numbers of occurrences 4-? ^^overlapping time 
intervals are independent. What happens in one interval is not affected 
by andjias no effect on what happens in oth'eV intervals. % 



Wiener process (Brownian motion), 

The parameter set is T » [0, ») . X Q » 0. The process is a model of * 
the movement along .a line of a "particle" under "random disturbances*" 
X t is the net movement along a coordinate axis in the time interval 
(Pi t] . In many situations, the disturbances are of*such a character that? 
the distances moved in disjoint time intervals ma£ be assumed independent. 
Hetyoe, the independent-increment assumption is appropriate. 



1Z7 



B3-3 



In the discrete-parameter case, the class of random walks (see 

Example El -a) possess the independent -increment property. We have 

* - Y. + Y- +".♦♦+ Y and X_^, - X « Y * , + Y + ♦ ♦ ♦ + Y The 
>n 1 2 n m+4c m m+1 m+2 m+k A " e 

assumed independence of the class (Y^ 1 < i} .ensures independence of 
the increments, 

> 

We wish to show that a process with independent increments is a 

Markov process. To facilitate exposition, we adopt the following termi- 

V 

^ no logy anjj^notation. , 

1) We say t] » (t Q , tj c T JLs a strictly ordered, finite 
^subset off T IffJ t Q < t x < < t , 

2) For any strictly ordered, finite subset of T, we define the random* 

variables Y » - X and Y, - X„ - X fc fqr 1 < k < n, 
U C 0 k C k fc k-l ~ ~ 

and the random vectors U - (X , X. , \ „ , X ) and 
K t Q t a t k 

Z k " (Y 0' V V for * ach k > 1 < k < n « 

We note that if we have the values of the coordinates of any one of the 

vectors U n , ' (Z^, X fc ), or (U^, Y n ) the values of the 

n 

coordinates of the others are obtained by linear transformations, which 
are continuous, hence Borel. Thus, we may assert 

A) Any one of the random vectors U,Z, (Z . , X ), or (IJ ,,Y) 

n n n — 1 t n-1 n 

■ n 
'is a Borel function of any one of the others. 

By virtue of property CE9b), we have * 

B) E[tf|Z n ] - E[tf|U n ] - E[w|U n l> X t ] - El»|2 n . 1 ,X t ] - E[w|u +\ ) a.a. 

n n 

Also, by virtue of independence of Borel functions of independent random 
vectors, 

C) If any of the pairs { Y n+11 UJ, (Y^, ZJ, (Z^, X )), 

n 

is independent, so are the others. 



ERIC 138 



E3-4 

With these facts, w$ can now establish the fundamental result 
Theorem E3-1 , 

If the process {X fc : t 6 T) has independent increments,' then it is 
a Markov process. , 
PROOF *. 

We show that for any strictly ordered, finite T R C T, the condition M 1 ) 

holds for X, « U . , X » X . and X « X 
U n-1 v t * w * t 

n n+1 

SCXt^) * 8(X t + Y^) - h(X t , Y^), with h' Borel and 



n+1 



E[g(X t )|U ni>1 , X t ] - E[h(X t »Y n+1 )|Z n-1 , X t ] a.s. by proposition B) 
n+1 n n n 

By proposition C) and Clll), {Y x , Z^) is conditionally independent, 

given X . . Hence, 
n 

Efh(X t , Y n+1 )|Z n _ 1 , X ] » E[h(X t , Y )|X ] a.s. by CI7) 

n n n n 

We may therefore assert 

E[g(X )|u* ,X ] »E[g(X )\X } a.s. 

♦ n+1 n L C n ^+1 fc n v 

which is the desired property. 

The following alternate criterion for independent increments Is frequently 
useful as an assumption .in modelling. 
Theorem E3-2 

A process [X t : t € T} has independent increments iff for every strictly 
.feipv ordered, finite T n c T, the pair {Y n , U^} is independent. 
MOOF ^ 

a) If the process has independent increments, the pair {Y , Z ,} is 

n n— 1 

independent. By proposition C) , above, so is {Y n , U } an independent 
pair. ' * 



139 



E3-5 

b) Suppose (Y n , tT^} is independent for all T n . Let T n be arbi- 
trarily selected, but fixed. For each k, 0 < k < n, set 
T k " {t 0' V' Sc 5 ' hypothesis, '{Y fc> U fc J 1 } is independent, 

l<k'<n. By proposition^), the pair {Y fc , Z^} is independent, 
1 < k < n. In particular, ^ Z Q ] « '{Yj, Y Q > is independent. 

Suppose for some k > 2, [Y Q , ^ Y kn ) is Independent. Then 

by tjhe independence o,f (Y k> ^.^V <V V - Vl^' we - 

have P( n Y 6 M ) - P(Y € tt.)P( f) Y. € M ) - P(Y. £ *l ^(Y, € MJ. 
f i«0 i«0 1 . * K 15 i«i 1 1 

Thu*9 {Y Q , Yj, ...y Y fc ) is independent. By mathematical induction, 
the class {Y Q> Y p YJ is independent. Since T n is arbitrary, 

the* desired proposition follows. » 



• • 140 : 

ERJC r • - i 



The Chapman -Ko Imp go rov equation 



- x a Markov process (x£ : 0 < t}, let 0 < < t < u. Then the ^ 
pair {X gl X y ) is conditionally independent, given X fc , As a special 
case of CI 8)*, we have , 
» CK) E[g(X u )iX s ] « E(E[«(X u )!X t J|X g ) a.s, 

This is the Chapman - Kolmo go rov equation , which plays a significant "role 

f in the study of Markov processes. ^ 

For a chain with finite state space S, the equation takes a simple 
form which is usually determined from the first;forra of the Markov property 
irt Sec El and elementary probability patterns. IfVpo. (s,t) * 
* J k l x s K J^> the^hapman-Kolmogorov equatW jLst^jually written 

CK 1 ) P lk <s,u) * Z p lj (s,t)p jk (t,u) ,0 < s O^C u. 
To see that this is a special form of CK), note that 

::HX u ^k^i).E[l {k) (X u )|X g -iJ. - X 

f =E f E[l (k) (X u )lX t ]|X s =i) v 

\] E[ ^k} (X u ) ' X t * j| P(X t = j l X s * i} 
« Z P iJ (s, t ) Pjk ( t ,u). 

In the case of stationary transition probabilities, let p^ be the 
m- 5 tep transition probability fronrstate i to state k. *CK') becomes 
ik j *ij ^jk 

which'is the form commonl y - e ^p ^ tered in elementary treatments. In such 
treatments, the transition probability matrix P plays a central role. If 
P (m isr the matrix of m-step transition probabilities! then P (m) « P m 
"PPP...P (m factors). The Chapman -Ko Imogorov equation CK") may be 
expressed compactly as f <^ 

CK ») p (m+n) n p(m) p (n). 



er|c Hi 



E 



If the random variables are absolutely continuous, the Chapman -Kolmo go rov 
equation is often expressed in terms of^ conditional density functions. 

ck»*) f x | X ( Z |x) - J f x | x (z|y)f x j x (yU) ' T : t 

\fn this case CK) may be written 



I g(z)f x (x (z|x) dz * J[Jg(z)f x | X (z|y) dz]f x | (yjx) dy , 
u 1 s u 1 t t 1 s 

«JgU)[ff x i x (ziy)f x | x (y|x) dy] dz. 



t A A fc A 

u t t 1 s 



u* t t' s 

In order for this equation to hold for all Borel functions g, by an 
analog to property E7)^^or integrals on the real lj.ne, we roust have 
CK* M ) for each x. 

In spite of Xhe importance of the Chapman-Kolmo^orov equation in many 
-aspects of Markov process theory, it is^^t true that the validity of this 
equation implies the process is Markov.' Stated another way, it is not 
true that the condition CI 7) may be replaced by the condition 

E[g(X)|z,Y] =E[g(X)t^] a.s. for any Borel function g. The latter* 

\ ' % 

condition, is not sufficient for the conditional independence of (X,Y), 

X ' ' A 

given Z. W. Feller has given counterexamples. The following is taken' 
from Parzen [1962] , p 203, but it is due essentially to Feller. 
Example E4-a 

Consider a sequence of containers, each with four balls, numbered one. 

through four. Select a ball independently, on an equally likely basis, 

fronr each container. Let 

A (1) = tevent ball 1 or 4 is drawn from the mth container 
m — v 

* l 

A (2) * event ball 2 or 4 is drawn from the mth container 

. * / 

A (3) ■ event ball 3 or 4 is drawn from the mth container. / 
m — 

Under the usual assumptions, PfA^Cj)] * 1/2 for any m > 1, any 
j * 1, 2, or 3. For any m (i.e., any container), we have a classical 



o 142 

RJC 1 ' 



EA-3 



( ^ 

example of a class (AJJ): j - 1, 2, 3) of events which is pairwise^ 
independent, but not independent, lince selections from various containers 
are independent, we assume (A^j^): 1 < m) is an independent class for 
any sequence [^i 1 < m) of elements of the set (1,-2, 3). Inus we 
may assert that (A (J): 1 < m, j - 1, 2, 3) is a pJirwise independent 
class >f with P[A n (J)J - 1/2 for any permissible m, j. We now form 
the process (X r : 1 < nj by setting 



i-l >+J 



X VJ>' J 



1, 2, 3, m > 1. 



This process has state space S » (0, 1), and the members are pairwise 
independent, with P(X - 0) - P(X n - 1) - 1/2. We also have 



P(X 



j|X n - i) - P(X n+r - j) - 1/2 for any* j, k <= (0, 1), any 



n > 1 , any r > J. . 

Thus, the m-step transition probability matrix is 

ll ' • ^ 

for any m > 1 , 




\ 



»(m+n) 



so the Chapman-Kolmogorov equation holds. However, the process is not 

Ma$coV, as the following argument shows. Since A ,(1)A ,(2) is a 

m+1 m+1 

subset of A^O), we have t > 0 . 



P(X 



'3m+3 



1 ' X 3mf2 X 3n*l 



1) -P(A a+1 (3)|A i&fl (2)A o+1 (l)) - 1 



t P(X 



3m+3 



itx 



k 3m+2 



1) 



1/2 



erJc 



143 



E5-1 



5. Proof of a basic theorem on Markov processes 

We,utilize the notational scheme introduced in Sec El. To prove 
Theorem El-1, we first obtain an intermediate result. 
Theorem E5-1 - ^ 

For a Markov process (X^ t € T}, with T = (J), 1, 2, ... }, the pair 

( x t+ l> U s-l) is conditionally independent, given any V , 1 < s < t. 

t s , 1 — — 

PROOF 

° He °° te that Vi - < u s-i> %, t -i> °»* V = < v s, t -i> V- *>* «* 

Bor.el function g)) any "s, t, 1 < s < t, 
* E[g(X t+1 )|X t ] > E[g(X t+1 )| 

U t-l ,X t^ a ' 3, by M') and CI6) 

By Lemma D5-1, with V « X ' U - U . , Z=»V , « h(U, ), 

t t>-l f" 1 

Efg(X t+1 )|X t ] = Zh(X c+1 )\V 3>t _ v X t ] .... \ 

- E[ g (x c+1 )|v st J. ^ / . 

The theorem follows by CI6). -p » ' 

t Theorem El-^l 

w ' . A* process (X t : t <: T}, T - (0, 1, 2, ... }, is Markov iff . «T 

M * ^t+l.t^fn* U s-1^ l€ condit io n ally independent, given any.finite 

extended present 1 < s <'t, ,any n > 1, any W* +1 t<Hj , U* y 

PROOF * * 

* * " * > 

M") implies M') as a special :case. , • « 

«*-«> * * & 

9 . Suppose M') holds. We need only establish M*> £W , U 1 is* 

^ • •> ^ ^ r t+J*», t+n 8 - 1 - 

.* conditionally independent, given V , 1 < s'< t, any* n > 1. ^fee * 
* 8 >> ~ — 

''^more general condition follows from CI9), with W£ ~1 « h(W ) 

^-N^^ » fc+lJJhi t+l,t-hr * 

8X16 ^Ui ' k(U 8 -l )# We construcTTp'^ot ^yS^frfefcical induction on 
. >n,«utilizing Theorem E5-1. 




» ' ' • . V 144 

.EBJC . \ . 



E5-2 



i) Since X t+1 - W t+1>t+1 , »M*) holds for n - 1, by Theorem E5-1. 
it) Suppose M*) holds for n ■ k. 

> — • 

' By Theorem E5-1, (X^^V U t ) is conditionally independent, given 
W t+1 t+k' HeI * ce > for ^ Bo" 1 function g, ; 
. E k (H t + l,«4c + l )|U s-i' V s,t ] *' . 

" Ef8(H t + i, t+ k' wi^* 

♦ • *M*<*M i p kL * tik+ {>h t+l , t4k llW t ) by CIS) 

- E[e(H t + l,^>l«s-l' V s,tJ 

- E ^ e ^ W t+l,t-Hc^ V s,t' by lnductiv e hypothesis and CI6) 

' ^"t+l.tWK.t 1 a ' S - by CI8) and CI9). 

^By^Cl6), M*) holds for n - k + 1. 

iii) By mathematical induction, M*) holds for any n > 1.* ri 



y 



> 145 ' 



E6-1- 

\6. Problems * 

E-l Stopping times . In dealing with a random process (X^J 0 < n} it 

N is sometimes desirable to consider a randomly selected member of 

the process. Suppose, for example, we wish to stop the process when 

a certain result (or pattern of resuLts) is observed. This means 

we select X as the last variable iff the observed sequence 

n ^ * 

(Sq^ s n ) 6 S n ^ ► of results 'exhibits a prescribed pattern, « 

hence belongs to ascertain subset M of S n+ *. We use this to 

n 

formalize the notion as follows: 

t 

DEFINITION. A nonnejgative, integer-valued random variable T is 

\ 

called a stopping time for the process {X n *: 0 < n} iff the event 

^ - {tt>: T(a>) - k} is determined by U fc « (Xg^, X^. Thus, 

-1 «° 
\ ** U k ty)' with *k A j = 0 for k * j ' We assume k E 0 P( V " l » 

which means that with probability one T is finite. 

CO so . » 

It is apparent that T - 'E k I. E k L (n ) a.s. 

k-0 *k k«0 "k k x 

a) Suppose X *is the value of a critical dimension of the nth 

n „, — 

item from a production line* The desired value is a. The 

process is stoppe'd for readjustment whenever | X 4 - aj > b. Show 

/ ' f 

that if T is the random variable which designates the number of 

the item at which t*he linV is stopped, then T is a stopping 

time for the process. *' a k V « 

Suggestion. Express . MJ in ter*f*6f\y{e ca#:d£nate sets ■ 
[a-b.a-H,]. ' - ^ ■ • '.C. 1- , *• 

• ' ^ ^ , '* , 

b) Show that ifvthe X are integer-valued, £he randQm'vaHri^^e^T. ^ 
1 w defined by Tj (u>) ■ minfh > 0: X n (a>) - if is a ^stopping time^ " ' '< 

c) Shdw that if is a stopping time for an integer-valued proces^ 0 
so is T 2 defined by, T^m) * minfn >T 1 (a>): X n (o>) » i}. , J^W^ 

ERLC , .146 - 



E6-2 



E-2 Suppose T is a stopping time for the process [X : 0 < n]. LeC 

^ 4 \lo^ k \ u ^\^' tte expressions ' 8 <V and 

muse be interpreted, since Che dimension of random vector U T changes 



^ ?'^ f Q ^ S™ and Q(k) is the, projection onto S k+1 , 
we set I rt 



then 



Show th*t E[g(Y)|uJ = E E[g(Y)|uJl (u' ) a.s. 

1 k k 
E " 3 Strong Markov property. Suppose [X n : 0 < n) is a Markov process and 

T is a stopping time for the process. 

*) Show tjiat E[g(W T>T4ti )|u i J = E[g(W T>T4ti )|X T ] 

b) If the process is homogeneous, show that ♦ 

E[ * (W W |X T ] " E[g(W 0,n)i X 0 ] 
E-4 Martingales. The following class of random processes has many ^ 

connections with the class of Markov processes (cf Karlin and Taylor 
..[1975] , Chap 6). 

DEFINITION. Let (X^ 0 < n) be, a sequence of real random variables 
and {Y n : 0< n) be a sequence of random vectors. Then {X : 0 < n) 
is a martingale with respect to (Y^^t) < n) iff i) E[|xJ]' is ' 
finite for each n > fr,' and ii) ^Tx^ |y q ,Y 1 , yj « X R a.s. 

for each n > 0. 

Note that conditions i) and ii) imply iii) x^ » e (Y Q ,Y^, . . . , *Y ) 
a.s., with e n a Borel function, for any n > 0. If Y fc * all k, /£ 

we say [X n : 0 < n) is a martingale, without qualifying expressioi^r 

a) Shpw that N for a martingale EfrJ » e[X q ] for all n. 

b) Show that if [X n : 0 < n) has independent increments (hence is 
• > * 

Markov) and EfxJ - E[x Q ] all A > 0, the process is"a martirigSle. 



' o . 147 
ERIC J ' 



Appendices 



148 



ERJC 



^9 



1 



APPENDICES 

Appendix I. Properties of Mathematical Expectation 

Appendix II. Properties of Conditional Expectation, 
Given a Random Vector • • 

Appendix III. Properties of Conditional Independence, 
~ Given a Random Vector 



AM 
AIM 
AIIM 



143 i 



ERLC 



AI-1 

APPENDIX I. , Properties of Mathematical Expectation 
El) B[I A J - P(A), 

Ela) E[I M (X)] - P(X € M); E[l M <X)I N <Y)] - P(X € M, Y € N) (with extension^ 
by mathematical induction to any finite number of factors). 

E2) Unearity . E[aX + bY] - aE[x] ,+ bE[Y] (with extensio'n by mathematical 
induction^ to any finite linear combination), 

E3) Positivity_; mono tonicity . 

a) X > 0 a.s. implies E[x] > 0, with equality iff X^ - 0 a.s. 

b) X > Y a.s. implies e[x] > e[y] , with equality iff X - Y a.s. 
E4> Monotone convergence . If X fl -* X monotonically* a.s., thA 

i E[X n ] -+E[x], nonotoniciHy. 
E5) Independence . The pair (X,Y) of random vectors is independent 
iff at E[l M (X)I N (Y)] - E[l M (X)]E[l N (Y)] for all Borel sets M, 
on the codomains of X, Y, respectively, 

<% 

* iff E[g(X)h(Y)] - E[g(X)]E[h(Y)] for all real -valued Borel functions 1 
g, h such that the expectations exist. 
E6 ) Uniqueness . 

a) Suppose Y is a random vector with codemain ff 1 and g, h are * 

reaUvalued Borel functions on the range of Y. If E[l u (Y)g(Y)] 

« M 

- E[l u (Y)h(Y)] for all Borel sets M on the codomain of Y, 
4* * 

then g(Y) - h(Y) a.s. * - 

b) More generally, if E 1 [l M (Y)I N (Z)g(Y,Z)] - E[l M (Y)I N (Z)h(Y,Z)] for 
all Borel sets M, N in 'the Codomains of Y^'Z, respectively, 

Q 

then g(Y,Z) - h(Y,Z) a.s. 

s 

E7) Fatou's lemma . If X n > 0 a.s., then Eflim inf Xj < lim inf EfxJ, 
,B8) Dominated convergence . If X n -> X a.s. and |xj <Y a.s., for 
each n, with e[y] finite, then -z[xj -> e[x) . m 



er|c ' 1SQ 



B9) Countable additlvlty . Suppose Efx] exists and A - Id A. . then* 

, - ■ i-1 1 
Efl.Xl - Z'E[I A X]. 
A ' i-1 ^ 

E1Q) Existence . If Efg(X)] is finite, then there is a real -valued Borel ' 

function e, unique a.s. [P^ , sucfr that E[l M (Y)g(X)] r E[l M <Y)e(Y)i 

for all Borel sets M in the codomai,n of * Y. » ' * rf 

BID Triangle inequality . |E[g(X)] | <^[(g(X)|] . 

«* • ' * 

E12) Mean-value theorem . If ' a < X < b a. Won A, * then aP(A^ < E[l A X] <**>P(A) 

EX3) Let g be a nonnegative Borel function, defined on the ranfee 'of X. Let 

A - (w: g[X(w>] > a). Ihen E[g(X)] > aP(A) . ' \ # 

E14) Markov' 8 inequality . If g > 0 and nondecr easing for t> 0 and a > 0, « 

• then g(a)P(fx| > a) < E[g(|x|)]. 

£15) Jensen's ineqylity . If g is a convex function on an interval I which 

f includes the fange of real random variable X, then g(E(x]) < E(g(X)] . 

B'16) Schwarz 1 inequality . If X, Y are real or complex random variables with 

E[|x| 2 ] and J2[|y| 2 ]. finite, then f e[xy] | 2 < E[|x| 2 ] E[| y| 2 ] with 

equality iff there is a constant c such that X ■ cY a.s. 1 

E17) HSlder's Inequality . Let 1 < p, q < » with i+-i«lf If *X, Y are 
> ~ - . P q * 

real or complex random variables with Ef|x| p ] and e[|y[^) finite, 
then E[|XY|] < E[|x| P ] 1/p E[|Y|^jy^; ' ' 
?18) Minkowski's inequality . Let . 1 < p < If X, Y are real oi^complex 
^ r random variables with ^ E[|x| P ] and«Et|Y| P ] finite/ then 
Btlx^YlP] 17 * <E[|X|P] 1/P +E[|Y| P f 1/p . 



APPENDIX II. Properties of Conditional Expectation, given a Random Vector 

We suppose, without repeated assertion, that the random vectors and 
Borel functions in the expressions below are such that ordinary expectations 
exist. 

CE1) e(Y) - E[g(X)|Y] a.s. iff E[l M <Y)g(X)f « E[l M <Y)e(Y)] for all 

Borel sets M on the codomain of Y. 
CEla) If P(Y € M) > 0, then E[l M <Y)e(Y)] - E[g(X)|.Y € M]P(Y € M). 
CElb) E[g(X)] - E{E[g(X)|Y]}. 

CE2) Linearity . E[ag(X) + bh(Y)|z] - aE[g(X)|z] +bE[h(Y)|^] a.s. (with 

_ extension by mathematical induction to any finite linear combination)?" 
CE3) Positivity ; mono tonicity . 

g(X) >'0 a.s. implies E[g(X)|Y] >0 a.s. 

g(X) >h(Y) a.s.. implies E[g(X)|z] >E[h(Y)|z] a.s.. 
CE4) Monotone convergence . X -* X a.s. nonotonically implies 

E[xJy] -* e[x|y] a.s. monotonically. \ 
CE5) Independence , a) {X,Y}^ is an independent pair iff 

b) ;E[l N (X)|Y] » E[l N (X)3 a.s. for all Borel sets N iff 

c) 'E[g(X)|Y] « E[g(X)] a.s. for all Borel functions g. 

CE6) e(Y) - E[g(X)|Y] a*s. iff E[h(Y)g(X)] - E[h(tf)e(Y)] for all Borel h. 
CE7) If X-h(Y), then E[g (X) | Yj - g<X) a.s. for all Borel g. 
CE8) E[h(Y)g(X)|Y] - h<Y)E[g(X)|Y] a.s. 

CE9) If Y-h(W), then E{E[g(X) | Y] | W} - E{E[g(X) | W] | Y} - E[g(X) | Y] a. s. 
CE9a) E{E[g(X)|Y]|Y,Z) E£E[g (X) | Y,Z] | Y] -Efg(X)|Y] a.s. 
CE9b) If Y - h(w), where h is Borel with a Borel inverse, then 
' E[g(X)|Y] - E[g(X)|w] a.s. 



Al-2 % 

CB10) If g is^Borel such that E[g(X,v)] is finite for all v on the 

the range of Y and E[g(X,Y)J is finite, then ( 

E[g(X,Y)|Y - J - E[g(X,u)|Y - u] a.s. [p^. 
CEU) In CE10), if {X,Y} is an independent pair, then & 

E[g(X,Y)|Y - u] -Efg(X,u)] a.s. {p^. 
CE12) Triangle inequality . |E[g(X)1 y] | < E[|g<X)| | y] a-s, '* 
CE13) Jensen s inequality . If g* 4s a convex* function on. an interval I 

which contains the range of real random variable, XT, then 

g(E[x|Y]) < E[g(X)|Y] a.s. 



r 



9 
* * 



J 5*3 



AIII-1 



APPENDIX III. Properties, of Conditional Independence, given a Random Vector 
the following conditions are equivalent: 

CI1) E[I M (X)I H (Y)|Z] - B[I m (X)|z]e[I h (Y)|zJ a.s. V Borel sets M, N. 
CI2) E[I M (X)|Z,Y] « Ell M (X)|z] a.s. Y Borel sets M. 
CI3) E[I M (X)I Q (Z)|Z,Y] = Efl M (X)I Q (Z)|z] a.s. ,V Borel sets M, Q." 
CI4), e[I m (X)I q (Z)|y] - E{E[l M (X)I Q (Z)jz] |Y) a.s. V Borel sets M, Q. 

CI5)* E[g(X)h(Y)|z] » E[g(X)|z]E[h(Y?|z] a.s. V Borel functions -g, h. 

CI6) E(g(X)|z ; Y] *E[g(X)|z] a.s. N Borel functions g. 

CI7) E[g(X,Z)|z,Y] = E[g*(X,Z)|z] a)s. V Borel functions g. ' , % 

CI8) E[g(X,Z)|Y]A E{E[g(X,Z)|z] |Y) a.s. Y Borel functions, g. ^ 

DEFINITION. Thel pair of random vectors [X,Y) is conditionally independent, 
given Z, iff trie product rule CIl) holds. An arbitrary class of random 
vectors is conditionally independent, given Z, if an analogous product 
rule holds for each rtnite subclass of two or more members of the class. 

CI9) If (X,Y) is conditionally independent, given' Z, U - h(X), and 
V - k(Y), with h, k -Borel, then (U,V) is conditionally inde- 
pendent, given Z. * * * ** * 

CI10) If the pair {X,Y} is conditionally independent, givfen Z, then 



a) E[g(X)h(Y)] » E{E[g<X)|zWh(Y)|z] E[ ei (Z)e 2 (Z)]' 

b) E{g(x)|Y € n]p(y € h) - e{e[i n (y)|z]e[^(x)|z]j.. * 

ChltX If {Y, (X,Z)) is independent, then (X,Y) is conditionally 
• independent, given Z. ^ " 
CI12) If (X,Y) is conditionally independent, given * Z, 'then 
x .B[g(X,Y)|Y - u, Z * v]* - E[g(X,u)|£ - v] a.s. [P^] . 



ERJC , £54/ 



* 



\ 

\ 




r 

References 



155 



ERIC 



) " 
' . i 

References 

Ash, Robert B. [1970] : BASIC PROBABILITY THEORY, John Wiltey & Sons, New York. 

Chung, Kai Lai [1974] : ELEMENTARY PROBABILITY THEORY WITH STOCHASTIC # 
PROCESSES, Springer-Verlag, New York. 

$inlar, Erhan, [1975] : INTRODUCTION TO STOCHASTIC PROCESSES, P*rentice-Hall , 
Inc., Englewood Cliffs, New Jersey. 

•■Gaper, Donald P., and Gerald Li Thompson [l97$] : PROGRAMMING AND PROBA- 
BILITY MODELS IN OPERATIONS RESEARCH, Brooks/Cole Publishing Co., 
Monterey, California. 

Hiliier, Frederick S. , and Gerald J. Liebennan [rt74]: # jykjlRODUCTION TO \ 

OPERATIONS RESEARCH, Second edi tfutty-Ho ftteir-i 
% v San Francisco. 



i, fcxn^ldjg| [i960] : DYNAMIC PROGRAMMING AND MARKOV PROCESSES, 
TecKSoTogy Press of MIT & John Wiley & Sons, Inc., New Yort 



Howard, ^ 

" York. 

f 

Mood, Alexander M. , Franklin A. Graybill, and Duane C. Boes [1974] : 
INTRODUCTION TO THE THEORY OF STATISTICS, Ihird edition, 
McGraw-Hill Book Company,* New York. 



Karl in, Samuel, and Howard M. Taylor [1975] : A FIRST COURSE IN STOCHASTIC 
PROCESSES, Second edition, Academic Press, New York. 

Parzen, Emanuel [1962] : STOCHASTIC PROCESSES, Holden-Day, Inc., 
^ San Francisco. 



Pfeiffer, Paul E. , and David A. ,Schum [1973] : INTRODUCTION TO APPLIED 
PROBABILITY, Academic Press, New York., 

Renyi, A. [1970] : PROBABILITY THEORY, American Elsevier Publishing 
Company, Inc., New York.- 

Schum, David A., and Paul E. Pfeiffer [1973] : "Observer Reliability and 
Human Inference." IEEE Transactions on Reliability , vol R-22, 
no. 3, August, 1973, pp 170-176. 

Schum, .David A., and Paul E. Pfeiffer [1977] : "A Likelihood Ratio Approach 
to Classification Problems using Discrete Data", Organieational ► 
Behavior and Human Performance 19,^ 207-225, August, 1977. 

Scott, D. We, L. Factor, and G. A. Gorry. [1978] :' A Model for Predicting 
the Distribution of Response Time for an Urban Ambulance System 
(to appear). r * 



156 



ERIC 




Selected ' Answers, Hints, 
and Key Steps 



/ 1 

i 

s J • ' 157 



SA-1 



, Selected Answers, Hints, and Key Steps 

• » *■ 

tfOC) - {0,A,A C ,n} it) 5(X) - {0 > A,B > C > AUB > ALX: > BUC > O} / 

A-2't) x^a-soj),. Ay b iii) x -1 ((-«,3]) - Ay By c - d c y 

A " 3 8 1^ 8 3 fP xl» but * 3 * « 2 a,s. [P x l 

At4 a) g is cont. (draw graph), hence Borel, all 
\ n 
A-8 a) X^O^A,- WA implies I » lim EL implies 
* - i-1 A n £«! A £ ^ 

n * 

, . £ f 1 I A £ X ^ ln creases to I A X, Use linearity, monotone convergence. 

B-6 i) ^implies P^ABl'D) - P(a|d)P(b|d) ii) ^ll P(AH| D) = P(a| D)P(h| D) 
111) lyl lea P(BHlD)_^VfBlDlPj[Hlj^ iv) implies-^ (ABH^)-^P-<AB^»P^Hji»- 



B^7 ' 576/228 

B-8tb) PCclTjT^/RCC^T^) = 64/99' c) PCcIt^/PCC^T*) » 16/*n 
B-9 PXW|Q)/P(W C |Q) - 1/3, implied P(w|Q) -*I/4 P(Q) - 1/2 " 

^ * P(W|Q C )/P(W C |Q C ) - 3/2° implies P(w|Q C ) * 3/5 

V PIW[AB C ) _ P(Q)P( W [Q)pqlQ )P,(B C lQ) + P(Q C )P(W |q c )p ( A | o c )p(b c \q C ) 41 

p(w*|ab c )\ p{q)p(w c |q)p(a|q)p(b c |q) +p(q c )p(w c |q c )p(a|q c )p(b c Jq c ) " 7 A 

B-10 (J,*) is conditionally independent, given A,, ' and given A C 
- P*0AT) « 0.54^ P(AT C ) - 0.12 P(A°T^ - P(T) -,P(AT) -0.06 P(aV) -0.28 
U^h) J> P<AT) P(BlA) + P(A C T) P(BlA°) m 342 
; Pfl^lB) /P(AT C )P(B|A) +^(X C T C )P(BtA C ) " 156 

B-ll b) P(D^)«0.2 P(X|D l )-0J. P(I C |dJ)«0.2 ^U^) - 0.96 * 

P(D 2 |l C D^) » 0 implies P(D*I C D ) -0 IC « ID « 0- * * ' 
e 1 2 * , i 2 

Hence,, PO^) - PO^ID^) + PCdJi^) - 0 
f i( - |c) I F(D I )Fa C |VF<D 2 ll C D 1 )F(cU C D 1 D ? ) - ^ 

«, 2 4 ^d 1 >(i c |d 1 )pcc!i c ) + p(d^)p(i c |d c )p(c|i c ) * 340 

*-l*E p - » 10 D 2i D 31 D 42 D 50 D 61 \ " 3 ' 29D > m °- 201 Classify in group 1 



158 



SA-2 

C-3 E[l A g(X)] - Efl^g®] +'S[l AB c8(X)] ■ E[g(xV|AB]p(AB) + E(g(*)|AB C ]p(AB C ) 
C-7 A - {X 2 + Y 2 < 1} « [(X,Y) € Q) Since Z - X on A, we have 
e[z|a]p(A) • Ejl Q (X,Y)X] - 0 (by evaluation of integral). Also 
E[Z|A C ]P(A C ) » E[l A cc] • cP(A C ). Hence E[z] « 0 + cP(A C ) - c(l - £) 
C-8 a) E[X 2 + Y 2 |X - t] - t 2 + e[y 2 ]X » t] - (3t 2 + 4)/2 1 < t < 2 
bj E[xy|x • t] - I t(t 2 + 2t + 4)/(t + 2) 1 < t < 2 

c) E[X|X < \ (Y + 1)] - E[XI Q (X,Y)]/E[I Q (X,Y)] - T^JJ-* f||- ~ 1.19 
% C-9 a) E[X 2 + Y 2 ^ - t] - t 2 + E[Y 2 ] « t 2 + 7/6 -1,< t < » 

b) n E[XY|X » t] « tE[Y] • t/4 -1 < t < • 
C-10 Efg(X,Y)jY^-~u7 v] « Efg*(X,Y,Z)|Y - u, Z - v] 

• Efg*(X,u,v)|Y - u> Z* = v] . 'by CElO) 
* - E[g(X,u)|Y * u, Z - v] 

C-ll E[gTX,Y)|z . v ] - E[e(Y,Z)rz^v] » E[e(Y,v)|z - >k 

» J e(u >V ) dF Y j z (u|v) « J E[g(X,Y)|Y » u/z « v] dF y | z (u|v) 
/ , « J E[g(X,u)|Y - u, ^ - v] dF Y j z (u|v) ' <' by Prob C-10 
JC-12 a) v(Y) - E[X 2 - 2e(Y)"X +,e 2 (Y)|Y] - e[X 2 |yJ - 2e(Y)E[x|Y] + e 2 (Y) * , 

c) E[v(Y)} +Var[e(Y)} - E{e[x 2 |y] J - E[e 2 (Y)] + E[e 2 (Y)] - E 2 [x] 

- E[X 2 ] - E 2 [X] _ { - % 

* C-13 a) E[l/N - n]p(N • n) - E[l, ,(N)d] » E[l, ,(N)Y ] since D - Y on 
7 ^ t n J l n J n n 

N" (n). This implies e[d|N • n] » E[Vj » nE[x] 

Var[D|N » n] - e{d 2 |n - n] - e 2 (n) - E[Y 2 ] - E 2 [Y ] 

* * n n « 

c) Var[D] •E[v(N)] + Var[e(N)] • E{NVar[x] } + Var{NE[x] }. " * 

* • Var[x] and E[x] are constants. , 

a) cp D (u) - E{E[e iuD |N] }. 

^ ^E[e luD |N - nJp(N - n) . % E{l {th) (N)e luD ] - E[l {n) (N)e iuY n] - 

- P(N - n)^(u) - P(N « n)<^(u) 
- ^ <o D (u) -EP(N.n) a£(u) - g^Cu)] 

ERJC 05 * . ' 



1 



SA-3 



' C-15 a) 0 < Var[e(Y)]/Var[x] - Var[e(Y)'] /{Var[e(Y)] + E[v(Y)] ) < 1 
since* v(Y) > 0 a.s. 
d)* Set^X* « (X - E[X])/^ and Y* - .{sOO E[g(Y)] }/d[g(Y)] 

p 2 [x,g(Y)] = e 2 [x*y*] - e 2 {e[x*y*|y] ) » e 2 {y*e[x*|y])^ ky CE8)* 

< E[(Y*) 2 ]E{E 2 [X*|Y] ) by E 16)> * 

»* E(E 2 {X - t E[x]jY})/Var[4 - Var[e(Y)]/Var[x] « K 2 
D-l By.CIll), {X,Y) is conditionally Independent, given Z. Hence 
E[g(X)h(Y)|z] « E[g(X)|z]E[h(Y)|2] - E[g(X)]Efh(Y) |z] by els) 
Vd-2 i) %lw (ujt r ; ...,t ft ) - .<>;»■/ /JO e (n-Du ^ ^(t^.... ^ 

tfHjw - tl t n ] - t/n-^O/fe^-OtO , d ] 1/(n . 1) Vn > 2 

ii) e[h[w - k' . . , k ] - <m + k)/ <X + n) k = k. + Jc. + ; . . 4, k # - / 

4 ^ 11 Y 1-2* -Yi; 3' „ 

1U) E[H|W « k^...., k n J «*\n + l)/(n + k + 2) • ^ 
' D-3 E[H] « 2/3 * E[H|s^ 0 - 8] « 8/11* Var[H] * 2/117 [ Var[H|s l0 '* 8} = 24/3783 
D-4 E[l M (H)D] » 2 E t I ( n j< N ) I M (H)Y n 1 ' J ? < N a ii)E(I m (H)e[yJh] J 
, D-5 a) By CI11), (V,D) is conditionally independent, given N. * 
P(W< t, N » n) « E[l Q (D,V)I {n) <N)] - E{I {n) ^N)ETl Q (D,V) [V,N] } 
BY CI12) E[I Q (D,V)|V * v, N ■ n] ' ■ f/i q (D,v)|iT» nj * P(D.< V*|f** n) ' 
P(W < t, N » n) * Z t I, 1 (k)P(D < vt|N « k) dF__(v>t>(N - k) 

% » P(N - n) |P(D < vt|N - n) dF y (Y)* . -> 

b) P(w < t|N - n) - 20 at, 0 < t « l/20a a 2 » nn/A / 

c) P( W < t [N - n) - 1 + y^j- (e" 15at - e" 25at ) 0 < t, a 2 « nn/A 
■D-6 p x (2B^) ^6.0370' R( 10.26) - - 11.15 R(20>26) « - 46.40 

R(30,26) - - 35.35 Optimum a - 20. * 
D-7 £(a,u) » 1 - 2p(a,u), where * 
£ p(a,u) « P(Y ■» a-l|H - u) +,P(Y - a|H^« u> + P(Y **. a+l|H « u\ - ' 
- C(n, .-Du^d - u)^ a+1 + C(n,a)u a (l 1 u) n ' a 

+ C(n,a+l)u a+1 (l ^u) 11 " 3 " 1 

. , ■ 




SA-4 



R(a,x) * EU(a,H)P(X - x|H)]/P(X - x) P( X - x) - 1/ (m+1) 

n + m + 1 ^ a > x ' 

To minimise R(a,x), maximize with respect to a the function 

K(a,x) - ^gjLadL. + C(n,a) C(n,fc*l) 

C(n+m,a-bc-l) C(ijjja,ftto) C(n4m,a-hc+l) 

For n - 10, m « 3,, x « 2> K(5,2) - 0,4324 K(6,2) « 0,4779 
S , K(7,2),« 0.4883 K(8,2) - 0,4534 K(9,2) - 0,3625 
Optimum R(7,2) - 1 - ^ K (7,2) - 0,1628 
D-8 Strategy: 1st stage- risk ^ (0,0) - 47/12 - expected gain for strategy 
2nd stage- If successful ~ ^(l,!), then risk 

If unsuccessful ~ © 2 (l f 0), then play safe v 
3rd stage— <^(2,2) indicates risk ' 

^ <P 3 (2,1) indicates risk ^ v 

s " co^ClfO) indicates safe ' 

E-l b) ( Tl - k , .^(„ k 6 M C X M C X ... x M C X M - ^ C S k+1 ) - A,, M - ft) 
E-2 Zlsm W ) - E[g(V) E I Mk (U k )X Q(k) (U k y] ^ 

a) From problem E- % 2 
" . Ef8(W T,T^>ly-^[g(W k>k ^)|U k ]l Mk („ k ) — 

-f ^8(^ ^)1^]!^^) by Harkov property 

Efg(w T l4ii )i M (x I )] -s Wlz% t ^WK\0> k )) 

■ ^nce Bfg(W T - | H.^)!^^^) .... 



9 

ERIC 



- 'iff! 



K-4 .) E[X n+1 ] - Effh^jlT^Tj. fU - E[X n ] 



SA-5 



EfX n+l * X n J + X n b y ^ ^-2, CE5), and 



CE7) 



0 + X 




162 



9 

ERIC 



